3 Edited by 
Donald G. Paterson 


University of Minnesota 


l Consulting Editors 


George K. Bennett, Psychological Corporation James P. Porter, Claverack, New York 
Harold E. Burtt, Ohio State University Harold F. Rothe, Fairbanks, Morse and Co., 
Allen L. Edwards, University of Washington Beloit, Wis. 


Cliford E. Jur n Mi f Julian B. Rotter, Ohio State Universit 
ot Jurgensen, Minneapolis Gas Co. Edward K. Strong, Jr., Stanford Unionit 


Irving Lorge,, T. C, Columbia University k 

A , Donald E. Super, T. C., Columbia Universit 
Quinn McNemar, Stanford University Morris S. Viteles, University of Panolar 
Alexander Mintz, City College of New York Alfred C. Welch, Knox-Reeves, Minneapolis 


Volume 38, 1954 


wa 
———— 
i Buresy Ednl.?SY. Research | 


| aoe i AMING COLLEGE | 
f oe. 


i Dated 


| 
| tens 
l 


Published Bi-monthly by the American Psychological Association, Inc. 
i Prince and Lemon Sts., Lancaster, Pa. 


ie 
_ as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879 


Acceptan iling at the special rate of postage provided for in paragraph (d-2), Section 34.4 
ce for mailing aot Le R. of 1948, authorized October 10, 1947 k 


Copyright 1954 by the American Psychological Association, Inc. 


-yr 


Contents of Volume 38 


Articles 

Andrews, T. G., Smith, D. D. and Kahn, L. A. An Empirical Analysis of the Effec- 

tiveness of Psychological Warfare..............000000---.. 00 ee L. 240 
Anikeeff, A. M. Attitudes on Social Issues of Business Administrators and Students 

in a School of Business Administration.................................... 407 
Anikeeff, A. M. Index of Collaboration for Test Administrators................. 174 
Anikeeff, A. M. Scholastic Achievement of Extension and Regular College Students 171 
Appel, V. and Kipnis, D. The Use of Levels of Confidence in Item Analysis........ 256 
Ash, P. Reliability and Validity of the Kopas Personnel Test Battery............ 155 


Balinsky, B. and Hujsa, C. Performance of College Students on a Mechanical 


Knowledge Test. |... lL... aaua oaoa aaea aaa aaa aa EL 111 
Bayton, J. A. and Thomas, C. M. Comparative and Single Stimulus Methods in 

Determining Taste Preferences.......................---.-.- 0.000000, 443 
Bendig, A. W. Reliability and the Number of Rating Scale Categories........... 38 
Bendig, A. W. Reliability of Short Rating Scales and the Heterogeneity of the 

RAE SU wins exes am aE sas DTA AEAEE EEEN A AAA A A c. 167 
Bendig, A. W. and Sprague, J. L. The Guilford Zimmerman Temperament Survey 

as a Predictor of Achievement Level and Achievement Fluctuation in Introductory 

E E e REEE TE 409 
Bernberg, R. E. Personality Correlates of Social Conforms eee es sae cae ces 148 


_ Bernstein, L. An Application of Rogerian Concepts to Nurse-Patient Relationships... 324 


Brayfield, A. H., Kennedy, Jr., C. E. and Kendall, W.E. Social Status of Industries 213 
Briggs, S. J., McCormick, E. J. and Kephart, N. C. The Effect of Hammer Size on 


Efficiency in the Task of Nailing. |... 0.0.0000 aaa 020000. 1 
Browne, R. C. Figure and Ground in a Two Dimensional D nanma 462 
Bruce, M. M. A Sales Comprehension Test..................... ETETE 302 


SEENE SNE tni A ni-n alemyee dic 48-02 o\ sini geitaciuniie desord acwamanabinn Geden 85 
Clark, K. E. and Gee, H. H. Selecting Items for Interest Inventory Keys.......... 12 
Clements, F. E., Bayton, J. A. and Bell, H. P. Method of Single Stimulus Deter- 

Minatiotis of Taste Preferences gmo: anmas naits ESEESE EE eaa ka inaa 446 
Comrey, A. L. and Deskin, G. Further Results on Group Manual Dexterity in Men 116 
Comrey, A. L. and Deskin, G. Group Manual Dexterity in Women : 178 

Conk. W N. and Medley, D. M. Proposed Hostility and Pherisaic-Tintue Seales aaa 

Se er eee tenessrnes ee 

me ae ps Note on Age and Productive Scholarship of a University Faculty...... 318 


Dingman, H. F. and Guilford, J. P. A New Method for Obtaining Weighted Com- 


Di Ven eo eae of ‘Methods of Presentation and Examining Conditions on 


i in a Correspondence Course................ pense sees 
Di tors, "E r aered and Student-Centered Approaches in Teaching 


x Coes a nna piira Smeets ed eee E ETTET ELETA 
Di ee o ee on ACE Psychological Examination Related to 


Educational and Occupational Differences.........-.-- 0+ +++ +0 sees eee eee 248 


iii 


iv Contents of Volume 38 


Falk, G. H. and Bayroff, A. G. Rater and Technique Contamination in Criterion’ 


Ratings 


Garvey, W. D. and Knowles, W. B. Pointing Accuracy of a Joy Stick Without 
Visual Feedback 


Glaser, R. and Jacobs, O. Predicting Achievement in Medical School: A Compari- 
son of Preclinical and Clinical Criteria 


Grace, H. A. Facilitating Legislative Research................................ 


Graham, W. R. Identification and Prediction of Two Training Criterion Factors. . . 


Guilford, J. P. The Validation of an “Indecision” Score for Prediction of Proficiency 
of Foremen 


Gustad, J. W. 


Criterion in Naval Air MPAs vow se kd viiwnacdinddndcigeon aivan nneneceoervn 
Hollander, E. P. and Bair, J. T. Attitudes Toward Authority-Figures as Correlates 
of Motivation Among Naval Aviation Cadets 


© Holmen, M. G. The Specialization Level Scale for the Strong Vocational Interest 
Blank 


Jenkins, W. L. and Karr, 
Simulated Soape Fate: 004 bs manmrnoariinau2464404¥4a10000010cecnucc 

Johnson, R. J. Relations 

Kephart, N. C. and Deu 
ia eri AOR. 

Layton, W. L. The Relat 


Lincoln, R. S. Rate Accuracy in Handwheel Cranking P ee ee 
Littman, R. A. and Manning, H. M. 


Lockman, R.F. Some Relationships Between the MMPI and a Problem Checklist. 
Longstaff, H. P. Practice 


i ea aiad eostas I T ERETO Clenical 
„MacKinney, A. C. and Jenkins, J. J. Readability of Employee’s Letters in Relation 
Fe Anoa evela aa aasian pi tdinmmanecss oa a eS Agi AAon 
MacLean, A. G. and Tait, A. T. Some Computational Short-Cuts in the Develop- 
ment or Analysis of Tests 


MacPhail, A. H. Interest Patterns for Certain Degree Groups on the Lee-Thorpe 
Occupational Interest Inventory...... 


aloney, P,W. Comparability of Personal Attitude Seale Administration with Mail 
Administration with and without Incentive 


260 


164 


Contents of Volume 38 


Marchetti, P. V. Manager-Employee “Understanding” in the Retail Grocery and 
Meat Markets inemra sc. sicioun Jane 
Mason, H. M. A Comparative Evaluation of Two Approaches to Job-Knowledge 
PestConstraCMON naive. sch aerate t eSywae E a eel ai 


Meyer, H. D. and Pressel, G. L. Personality Test Scores in the Management 


Minnesota State Employment Service in Cooperation with the U. S. Employment 
Service, U. S. Department of Labor, Washington, D. C. Standardization of the 
GATB for the Occupation of Tabulating Machine Operator........... keuze p iy 

Mintz, A. The Inference of Accident Liability from the Accident Record........... 

Mintz, A. Time Intervals Between Accidents... u.. 


Be Added to an Existing Test BAULSIY!, poszucavs sayi revenien a a 
Newman, S. H. Quantitative Analysis of Verbal Evaluations................... 


Siegel, A. I. The Check List as a Criterion of Proficiency’. si 5.64 trea Gecesesan aaa 
Siegel, A. I. An Experimental Evaluation of the Sensitivity of the Empathy Test.... 


Strong, Jr., E. K. Validity versus Reliability’... 4ie 52 sia cmmsaersssassenanaomeen 
Teel TK. S. and Du Bois, P. H. Psychological Research on Accidents: Some 
, Methodological Considerations........ 0.260.200. e0.eceeeeeec cece 
Tiffin, J. and Winick, D. M. A Comparison of Two Methods of Measuring the 
Attention-Drawing Power of Magazine Advertisements...................... 


364 


397 


vi Contents of Volume 38 


Tinker, M. A. Readability of Mathematical Tables..................0........ 
Van Zelst, R.H. The Effect of Age and Experience upon Accident Rate. EEEE 
Van Zelst, R. H. and Kerr, W. A. Personality Self-Assessment of Scientific and 
Pechnical. Personnel ...6646s6 sac nsec ca cenecavamevwaeesdeaaenenanevannse ao 
Washburne, N. F. and Andrew, D. C. Relation of Scholastic Aptitude to Socio- 
economic Status and to a Rural-to-Urban Continuum........................ 
-Wexner, L. B. The Degree to Which Colors (Hues) Are Association with Mood- 
PLONE Sia weenie eer EEEE TET ES eo clecicuacuss odenvectarden ws wravsdiornine ied sreaninnn 
Wilson, R. C. and Comrey, A. L. A Short Method of Factor BERGE «oo ccensne ss E 
Wilson, R. C., High, W. S., Beem, H. P. and Comrey, A. L. A Factor-Analytic 
Study of Supervisory and Group Behavior. .........-.-.2+0s4s0ees nn. PENON: 
Witryol, S.L. Scaling Procedures Based on the Method of Paired Comparisons... . 


Wood, T. L. The Relationship Between Mechanical Aptitude and Proficiency 
Tests for Air Force Mechanics 


\ Book Reviews 


Anonymous. Army Personnel Tests and Measurements: Harold E. Burtt 
Berdie’s Roles and Relationships in Counseling : Arthur H. Brayfield 
Bross’ Design for Decision: Allen L. Edwards 


MOIS re 
sychological Scaling: Marvin D. Dunnette 
ogy of Successful Selling: Brent Baxter 
Illuminating Engineering Society's Recommended Practice for 


Residence Lighting: 


Jahoda, Deutsch, and Cook's Research Methods in Social Relations, with Especial 
Reference to Prejudice; Vol. I: Basic Processes; Vol. II: Selected Techniques: 
Harrison G. Gough.......... 


an Female; 
and Corner’s Twenty-five 


Years of Sex. arch Council Committee: 


Donald G. Paterson 


Lincoln's Incentive Management: Albert S. Thompson. o.o srecn rent. 24 99 go asana 
Lundin’s An Objective Psychology of Music: Kate Hevner Mueller............... 
Marketing and Social Research Division of the Psychological Corporation’s The 
Measured Effectiveness of Employee Publications: Donald G. Paterson 
Comment on Preceding Review : Charles L. Vaughn 
McFarland’s Human Factors in Air Transportation : 
Montagu’s The Natural Superiority 
New York Academy of Medicine an 
tions of the Conference on) 
Clark L. Hosmer... , 
Personality : Symposi 
Prank A. Patties ses o.c0<ceieestcebnaemmnenss n. 
Powers and Witmer’s An Experiment in the Prevention of ou ccc ia 


ation’s (Transac- 
and Control of Panic: 


Contents of Volume 38 vii 


Remmers’ Iatroduction to Opinion and Attitude Measurement: Sidney S. Goldish. . 377 
Schlotter aud Svendsen’s An Experiment in Recreation with the Mentally Re- 


BMGs RIRO FE: RGU e Ebb sron nera n secu he arent oa 379 
Sherif and Wilson's Group Relations at the Crossroads: Bernard M. Bass. . . . . inser 378 
Traxler, Jacobs, Selover, and Townsend’s Introduction to Testing and the Use of Test 

Results in Public Schools: Marjorie Olsen... .........0.0 2.0000 ED EURAS ge 68 
Tuckman and Lorge’s Retirement and the Industrial Worker: Prospect and Reality: 

Men: D. Dunnetterrs:s cexensccauen 6533 (Rae s sare EN A oG 375 
Tyler's The Work of the Counselor: Donald E. PN acs o 8S AEE wee A 139 
Viteles’ Motivation and Morale in Industry: Clifford E. Jurgensen............... 136 
Woolf and Woolf’s The Student Personnel Program: John W. Gustad............ 206. 


Applied Psychology in Action 


Colmen, J. G. Psychological Research in Personnel Administration.............. 61 
Dahlstrom, W. G. Personnel Psychology and Small Business................... 203 
Dvorak, B. J. GATB in Foreign Countries... .. u... -5-0044 EEEE A 373 
Epstein, M. A Note on “The Non-Directive Approach in Advertising Appeals”... 133. 
Hadley, H. D. Reply to Dr. Wells and to Miss Epstein.....................0-. 202 
Jurgensen, C. E. Reporting Employment Test Scores to Supervisors............ i 
Kerr, W. The Measurement of Academic Freedom..............002000ec0eeeee 134 
Knauft, E. B. Time Limit versus Work Limit Methods of Test Administration... 62 
Murrell, K. F. H. Note on the Work of the British Standards Institution........ 202 
Wells, F. L- Comment on Word Meaning............... 00000 cc cee ace eeevens 133 
Employee Opinion SUrveySics «soso dancdapeiaie sae oeiki SiE ace ARE eS k EUa 63 
Legal Status of Advertising and Marketing Psychology Experts................. 276 
Miscellaneous 


New Books, Monographs, and Pamphlets.................. 71, 143, 210, 282, 380, 468 


Journal of Applied Psychology | 


VoL. 38, No. 1 


FEBRUARY, 1954 


The Effect of Hammer Size on Efficiency in the Task of Nailing * 


Stewart J. Briggs, E. J. McCormick, and N. C. Kephart 


Occupational Research Center, Purdue University 


? 


Any hardware store salesman . “knows’ 
what size and type of hammer to use with 
different sizes and types of nails. On the ba- 
sis of the intuitive knowledge that impreg- 
nates the atmosphere of any hardware store, 
the salesman will sell to the home craftsman 
a small hammer to use with small nails and 
a large hammer to use with larger nails. 
There has apparently never been any em- 
pirical evidence, however, to verify or deny 
the salesman’s judgment on these matters. 
This study was designed to provide at least 
a fragment of such empirical evidence. 

More specifically the investigation was car- 
ried out to determine the relationships, in 
terms of efficiency in nailing, between sizes 
and types of hammers and sizes and types of 
nails as used by home craftsmen. Six ham- 
mers were used in the experiment, four of 
them being claw hammers and two rip ham- 
mers. Five sizes of finishing nails and five 
sizes of common nails were used. 


Experimental Procedures 


While it would have been desirable to es- 
tablish conditions that simulated those which 
the home craftsman would meet, it was not 
possible to accomplish this objective entirely 
because of the need to exercise experimental 
controls. 


Pilot Study. A pilot study was carried out 
with one subject. On the basis of the pilot 
study, certain observations were made and these 
were used in developing the procedures for the 
experiment proper. Following are the observa- 
tions that resulted from the pilot study: 


* Appreciation is expressed to Mr. L. A. O'Connor, 
Store Manager, and Mr. Myron Burkenpas, Man- 
ager of the Hardware Department, Sears Roebuck 
and Company, Lafayette, Indiana, for the loan of 
the hammers for this experiment. 


1. The measured time of the task was a more 
suitable criterion of performance than number of 
strikes of hammer since it takes into account the 
effect of bent nails. 

2. The wood used should be of uniform grain 
and of medium hardness. 

3. The optimum number of nails to be driven 
for each nail-hammer combination was about 
three. 

4. Rest periods were necessary to reduce vari- 
ance due to fatigue. 

Subjects. Six subjects were used in the ex- 
periment. All of the subjects selected had had 
experience as home craftsmen, yet not as profes- 
sional carpenters. The subjects were all males 
between the ages of 21 and 39 years, and were 
associated with Purdue University; one was a 
professor of psychology, four were graduate stu- 
dents in psychology, and one was an undergradu- 
ate in the field of engineering. 

Materials. The following materials were used: 

1. Six hammers were used: four were classi- 
fied commercially as 7, 10, 13, and 16 oz. claw 
hammers; and two were 16 and 20 oz. rip ham- 
mers. The hammers were marked with letters 
for identification. It should be noted that weight 
size refers to the weight of the hammer head, 
and the terms “claw” and “rip” refer to the shape 
of the head. 

2. Nails of the following types were used: 4, 
6, 8, 10, and 16 penny common wire nails, and 
2, 4, 6, 8, and 10 penny wire finishing nails. The 
nails varied in length by half inch intervals and 
increased in gauge with the larger sizes. The 
finishing nails were of smaller gauge than their 
penny equivalents in common nails. 

3. One eight foot top grade fir 2 X 4 per sub- 
ject. 

4. Two sawhorses approximately 34 inches in 
height equipped with a wooden groove to hold 
the 2 X 4 in place during the experiment. 

5. Nail containers: one wooden nail bin of 
nine compartments and one can for holding the 
largest nails. Each container was marked with 
the size of-the nail it contained. 

6. One table on which the nail bins were placed. 
positioned to place the nails within easy reach of 
the subject. 


7. One stop watch calibrated in hundredths of 
a minute. 

Warm-up Period. Each subject was allowed a 
short warm-up period during which he drove up 
to a total of ten nails of various sizes using three 
or four different hammers. This warm-up pe- 
riod ended when the subject said he was ready 
to begin the experiment. 

Experimental Sequence. Each subject drove 
nails of each of the ten types and sizes with 
each of the six hammers, making a total of sixty 
combinations of nails and hammers for each sub- 
ject. A deck of sixty IBM cards was prepared 
for each subject, each card representing a com- 
bination of one nail type and size and one ham- 
mer. These cards were thoroughly shuffled, and 
the subject followed the randomized order that 
resulted from this shuffling; as the subject com- 
pleted any one combination of nail and hammer, 
the experimenter would tell him what combina- 
tion to use next. Three nails of each type and 
size were driven by the subject. The experi- 
menter timed the subject on the total time re- 
quired to drive the three nails from the time the 
subject grasped the first nail until he had com- 
pleted driving the third one. The time records 
were recorded on the cards. (The time records 
were later punched into these cards for use in 
statistical analysis.) 

Short rest pauses of approximately one-half 
minute were introduced between each of the 
sixty combinations. Two longer rest periods of 
ten minutes divided the experimental time into 
roughly three equal intervals. 

Instructions to the Subject. The following in- 
structions were given to the subject: 

“The task involves driving nails into this 2 x 4, 
You are to drive the nails in sets of three; that 
is, I will measure the time from the instant you 
grasp the first nail until you have finished driv- 
ing the third. Drive each nail until its head is 
flush with the board before driving the next. 
Try not to mar the wood. If the nail starts to 
bend, try to correct it; and if it seems too bent, 
pull it out and use another nail in its place. 

“Before each set I will tell you which hammer 
and nail to use. These are identified by the let- 
ters on the hammer and the numbers on the com- 
partments of the nail bins. You will then select 
the proper hammer and hold it in the hand you 
wish to hammer with. When you have located 
the proper nail bin, say ready after which I will 
say go. Then grasp the nail and start hammer- 
ing. Drive the nails as fast as possible, remem- 
bering that bent nails will slow you up. Are 
there any questions?” 


Results 


The data were treated Statistically using an 
analysis of variance to identify significant 
variables and interactions. The data were 


S. J. Briggs, E. J. McCormick, and N. C. Kephart 


Table 1 


Analysis of Variance 


Mean 
Source’ Square d.f. i 
Subjects 155.1 5 a 
Hammers 734.3 5 16.97% 
Nails 2344.4 9 50.3 
Hammers Subjects 43.4 25 108 
NailsX Subjects 46.7 45 116 
HammersX Nails 73.0 45 1.82 
HammersX NailsX Subjects 40.2 225 
Total 359 
Special Analyses 
Mean 
Source Square d.f. F 
Hammers 4 
16 oz. claw—16 oz. rip 6.07 1 A 
Nails (4, 6, 8, 10 penny) yt 
Finishing—Common 417.9 1 8.9 


Cher ce. 
** Denotes significance at the 1% level of confiden 


further treated using the process described 
by Tukey * to break up the data into signifi- 
cantly different groups. 

Analysis of Variance. The results of the 
analysis of variance are presented in Table 1, 
These findings may be interpreted as showing 
that the variance within the different sizes 9 
hammers as well as different sizes of nails 15 
Statistically significant. Only one claw ham- 
mer and one rip hammer were of comparable 
size (16 oz.) and they were found not to be 
significantly different. There was a signifi- 
cant difference between the two types of nails 
(common and finishing) when only thos? 
sizes represented in both types (4, 6, 8, } 
penny) were considered. The finishing nails 
were driven more slowly than were thel 
penny equivalents in common nails. 

There was no significant interaction eithe! 
between hammers and subjects or betwee? 
nails and subjects. There was found a sig 
nificant variance ratio in the hammer by na} 
interaction. That is, certain hammer a? 
nail combinations can be considered bette 
than others when driving time is used as # 
criterion. To locate these specific combin® 
tions, the Tukey process was used. 


jp 
‘Tukey, J. W. | Comparing individual means iy 
the analysis of variance. Biometrics, 1949, 5, 99- 


Effect of Hammer Size on Task of Nailing 3 


Table 2 
Hammer Size Groups for Specific Nails 
Mean Sub- Mean Sub- 
Nail Hammer Time group Nail Hammer Time group 
4C 16 Claw 21.17 =f 2F 16 Rip 22.67 * 
4C 20 Rip 23.17 * 2F 13 Claw 26.00 s 
4C 13 Claw 23.50 $ 2F 10 Claw 26.00 t 
4C 16 Rip 23.67 = 2F 7 Claw 26.33 ha 
4C 7 Claw 25.17 ii 2F 16 Claw 29.50 e 
4C 10 Claw 25.33 s 2F 20 Rip 34.00 i 
6C 16 Claw 25.33 à 4F 16 Rip 24.17 e 
6C 20 Rip 26.83 ? 4F 20 Rip 27.00 * 
6C 16 Rip 26.83 l 4F 16 Claw 28.00 = 
6C 13 Claw 27.50 g 4F 13 Claw 28.67 * 
6C 10 Claw 31.83 a 4F 10 Claw 29.17 t 
6C 7 Claw 33.00 ig 4F 7 Claw 30.00 ha 
8c 13 Claw 30.67 z oF 20 Rip 28.33 z 
8C .16 Claw 31.17 > oF 16 Claw 28.67 x 
8C 20 Rip 31.33 * 6F 16 Rip 29.83 
8C 16 Rip 34.00 $ 6F 13 Claw 30.33 * 
8C 10 Claw 36.33 ‘i 6F 7 Claw 33.50 ba 
‘8C 7 Claw 40.17 x 6F 10 Claw 35.50 * 
10C 16 Claw 34.00 I 8F 16 Rip 33.17 * 
10C 20 Rip 34.17 I 8F 20 Rip 33.33 s 
10 C 16 Rip 36.67 I 8 F 16 Claw 33.67 = 
10 C 13 Claw 37.00 I 8F 13 Claw 34.83 rx 
10C 10 Claw 45.17 I 8F 7 Claw 39.33 x 
10 C 7 Claw 52.50 II 8F 10 Claw 40.83 z 
16C 20 Rip 41.50 I 10 F 16 Rip 36.17 I 
16C 16 Rip 46.17 I 10 F 13 Claw 37.33 I 
16C 13 Claw 47.00 I 10 F 16 Claw 38.00 I 
16 C 16 Claw 47.17 I 10 F 20 Rip 40.17 I 
16 C 10 Claw 54.33 I 10 F 10 Claw 43.67 I 
16C 7 Claw 66.67 II 10 F 7 Claw 51.83 II 
Legend: 
Nail Type: 


Number refers to penny size (2 = 2 penny, 4 = 4p 

C = Common wire nail; F = Finishing wire nail. 
Hammers: 

Number refers to size (16 = 16 oz. hammer, etc.). 


Subgroups: 


enny, etc.). 


* = No significantly different subgroups formed (at 1% level). 
I = Subgroup with significantly faster driving time (at 1% level). 
H= Subgroup with significantly slower driving time (at 1% level). 


Tukey Process? The data presented in 
Tables 2 and 3 represent the results of em- 
ploying the technique developed by Tukey 
for dividing a group of means into signifi- 
cantly different subgroups. Table 2 shows 
for each nail size the subgroups that oc- 
curred between the means of the different 


2 Tukey, J. w, Ob. cit. 


hammers, i.e., given a nail of a certain type 
and size, which hammer or hammers are the 
best? Table 3 shows the subgroups occurring 
between nails when each hammer was used. 
As the number of subgroups formed is de- 
pendent upon the variance of the whole 
group, the number of subgroups is not con- 
stant. 


4 S. J. Briggs, E. J. McCormick, and N. C. Kephart 


In Tables 2 and 3 the significantly differ- 
ent subgroups are identified with Roman 
numerals. With two subgroups numerals I 
and II are used; with three subgroups, nu- 
merals I, II, and III. An asterisk (*) is used 
where no subgroups were found. 


On the basis of the subgroups that were 
found, it is possible to make certain general 
recommendations with regard to hammer-nail 
combinations. In Table 2 it will be noted 
that for three nail sizes significantly different 
subgroups of hammers were formed; in each 


Table 3 


Significantly Different Hammer Subgroups for Specific Nails 


Mean Sub- Mean Sub- 
Hammer Nail Time group Hammer Nail Time group 
7 Claw 4C 25.17 I 16 Claw tÉ 21.17 I 
7 Claw 2F 26.33 id 16 Claw 6C 26.50 II 
7 Claw 4F 30.00 I 16 Claw 4 F 28.00 IL 
7 Claw 6C 33.00 1 16 Claw or 28.67 II 
7 Claw 6F 33.50 I 16 Claw 2F 29.50 IL 
7 Claw SF 39.33 I 16 Claw aC 31.17 II 
7 Claw 8c 40.17 I 16 Claw 8F 33.67 II 
7 Claw 10 F 51.83 II 16 Claw 10C 34.00 II 
7 Claw 10C 52.50 II 16 Claw 10 F 38.00 IL 
7 Claw 16C 66.67 Il 16 Claw 16C 47.17 HI 
10 Claw 4C 25.33 I 16 Rip 2F 22.67 I 
10 Claw 2F 26.00 I 16 Rip 4C 23.67 1 
10 Claw 4F 29.17 I 16 Rip 4F 24.17 I 
10 Claw 6c 31.83 I 16 Rip 6C 26.83 I 
10 Claw 6F 35.50 I 16 Rip or 29.83 I 
10 Claw 8C 36.33 I 16 Rip 8F 33.17 I 
10 Claw 8 F 40.83 Il 16 Rip BE 34.00 I 
10 Claw 10 F 43.67 II 16 Rip 10 F 36.17 I 
10 Claw 10C 45.17 II 16 Rip 10C 36.67 I 
10 Claw 16C 54.33 IT’ 16 Rip 16C 46.17 L 
13 Claw 4c 23.50 I 20 Rip 4C 23.17 I 
13 Claw 2F 26.00 I 20 Rip 6C 26.83 I 
13 Claw 6C 27.50 I 20 Rip 4F 27.00 I 
13 Claw 4F 28.67 I 20 Rip óF 28.33 I 
13 Claw 6F 30.33 I 20 Rip 8C 31.33 I 
13 Claw 8c 30.67 I 20 Rip 8F 33.33 I 
13 Claw 8 F 34.83 I 20 Rip 2F 34.00 I 
13 Claw 10C 37.00 I 20 Rip 10C 34.17 I 
13 Claw 10 F 37.33 I 20 Rip 10 F 40.17 II 
13 Claw 16C 47.00 Ill 20 Rip 16C 41.50 1 
Legend: 
Nail Type: 


Number refers to penny size (2 = 2 penny, 4 = 4 penny, etc.). 


C = Common wire nail; F = Finishing wire nail. 
Hammer: 

Number refers to size (16 = 16 oz. hammer, etc.). 
Subgroups: 


I = Subgroup with significantly faster driving time (at 1% level). 
II = Subgroup significantly slower than I (at 1% level). 
t= Subgroup significantly slower than I and IT (at 1% level). 


Effect of Hammer Size on Task of Nailing 5 


such case the hammers in subgroup I are to 
be recommended over those in subgroup II. 

From an overview of Table 2 it may be 
concluded that the 10 and 7 oz. hammers 
would not be good general purpose hammers 
under conditions similar to those in this ex- 
periment. The 16 oz. and 20 oz. rip ham- 
mers, and the 13 and 16 oz. claw hammers 
appear to have better all-around character- 
istics. 

It will be observed in Table 3 (which deals 
with subgroups of nails for individual ham- 
mers) that significant subgroups of nails were 
formed for each hammer. In some cases two 
subgroups were formed; in such cases the 
nails included in subgroup I were driven sig- 
nificantly faster with the hammer in question 
than those nails in subgroup II. In the case 
of other hammers, three subgroups of nails 
were formed; in such cases the nails included 
in subgroup II were driven faster than those 
in subgroup III, and those in subgroup I 
were driven faster than those in II. A gen- 
eral observation of Table 3 would suggest 
that smaller nails were driven faster than 
larger nails, which of course is to be ex- 
pected. It might be noted that the 4 penny 
common nails were in subgroup I for all ham- 
mers. 

It should be stressed that the data in Table 
2 are applicable for the situation where nails 
of a given size and type are to be driven, and 
it is desired to select the most efficient avail- 
able hammer for the job; this covers most 
home craftsman situations. However, it is 
conceivable that situations would arise where 
various nails could be used equally well, 
where they are to be used in quantities, and 
possibly where there is a limited choice of 
hammers; in such a situation the data in 
Table 3 would be appropriate since they show 
the relative speeds with which various nails 
were driven with specified hammers. 


Discussion 


The results are not entirely consistent with 
the salesman’s intuitive judgment. The large 
hammers were found to be better with the 
larger nails; however, for smaller nails, the 
smaller hammers were not significantly bet- 
ter. This may be a function of the range of 


nail sizes; if small brads had been included, 
the smaller hammers might have been found 
to be better, although it should be noted that 
two penny finishing nails (which were used 
in the experiment) are quite small, being only 
one inch long and of small gauge. 

The two 16 oz. hammers were expected to 
have the same hammering characteristics as 
they differ only slightly in the shape of the 
nail pulling part of the hammer head. This 
small deviation would not be expected to af- 
fect the balance of the hammer seriously, and 
no significant differences were found between 
these two types of hammers. 

The statistically significant difference be- 
tween common and finishing nails was not 
entirely expected. It was thought at first 
that the greater diameter of the common 
nails would offer greater resistance and hence 
slow up the driving. However, this same 
greater diameter presumably tended to re- 
duce the time lost due to nail bending. It is 
also possible that the appearance to the sub- 
jects of greater frailty of the finishing nails 
may have made them somewhat more cautious 
(and therefore slower) in driving the finish- 
ing nails. 

It should be kept in mind that the experi- 
ment was conducted using only fir. While 
this is a commonly used wood by the home 
craftsman, the results cannot be generalized 
with assurance to harder or softer woods. It 
might be hypothesized that the results are 
more general than this experiment indicates, 
as the relationship of the weight of the ham- 
mer to the bending resistance of the nail 
might be more crucial than the hardness of 
the wood. If this were true, it would indicate 
that the skill of the hammerer is most impor- 
tant in nailing into harder woods; but the 
relationship of the hammer and nail would be 
the same. Further research would, of course, 
be required to explore such variables. 

It is recognized that time is not necessarily 
the best criterion of performance for every 
situation; for instance, in cabinet work or 
finish carpentry, lack of mars in the wood 
undoubtedly would be a better criterion of 
performance than speed. It should be noted 
that in this experiment an attempt was made 
through instructions and reminders to con- 


6 S. J. Briggs, E. J. McCormick, and N. C. Kephart 


trol the marring of the wood, but this was 
not completely successful. 

The entire field of study of the tools of the 
home craftsman is lacking in systematic in- 
vestigation. The methods of analysis of 
variance and Tukey’s process appear to be 
powerful tools for study in this area because 
they allow more than one variable to be 
studied at a time and yet permit specific 
recommendations to be made. In studies of 
this field, it seems advisable to plan a pilot 
experiment (as the one carried out in this 
study) to locate and control some of the un- 
expected experimental difficulties so they will 
not interfere with the main study. 


Summary 


This study was carried out to determine the 
relationship in terms of efficiency of use be- 
tween six hammers (7, 10, 13, and 16 oz. 
claw and 16 and 20 oz. rip hammers) and 
ten nails (4, 6, 8, 10, and 16 penny common 
and 2, 4, 6, 8, and 10 penny finishing nails) 
when used by home craftsmen. The six sub- 
jects were home craftsmen without profes- 
sional carpentering experience. The subjects 
drove a set of three nails into a fir 2 X 4 for 


each of the sixty possible hammer and nail 
combinations. Time was the criterion of 
performance. 

Analysis of variance was used on the data, 
and the results indicated: 


1. The variance in time due to the different 
hammers was statistically significant. 

2. The variance in time due to the different 
nails was statistically significant. 

3. There was no statistically determined 
difference between the 16 oz. rip and claw 
hammers. 

4. The finishing nails were slower to drive 
than the common nails. 

5. The variance in time due to nail by 
hammer interaction was significant. 

The data were further treated by Tukey's 
process to locate various significant sub- 
groups of hammer-nail combinations. Spe 
cific recommendations were made considering 
first the hammer, then the nail, as the inde- 
pendent variable. A 

The methods used were felt to be appli- 


cable to other research in the field of home 
craftsman’s tools. 


Received April 23, 1953. 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 1, 1954 


A Note on “Predicting Success in Elementary Accounting” 


Robert Jacobs 
Educational Records Bureau, New York, N. Y. 


A study reported by O. R. Hendrix in the 
April, 1953 issue of J. appl. Psychol. com- 
pared the validities of the 1947 Edition of 
the ACE Psychological Examination, the 
Ohio State University Psychological Test 
(Form 23), and the latest form (Form C) 
of the Orientation Test used in the account- 
ing testing program sponsored by the Ameri- 
can Institute of Accountants. The criterion 
for validity was grades received in elementary 
accounting by 76 men and 19 women in the 
College of Commerce and Industry of the 
University of Wyoming. 

The correlations reported in this study be- 
tween accounting grades and the scores on 
the different tests ran somewhat higher for 
the ACE Psychological Examination and the 
OSU Psychological Test than for the AIA 
Orientation Test. 

On the basis of the data which he obtained, 
Hendrix concluded that “If a single test is to 
be utilized in predicting grades in elementary 
accounting, ACE Psychological Examination 
and OSU Psychological Test are preferable to 
the AIA Orientation Test.” The author of 
the study points out that the investigation 
was restricted to the relationship between the 
test scores considered and grades and that a 
different pattern of relationship might be 

` found if the criterion of validity were success 
in actual professional employment as an ac- 
countant. 

However, the relative superiority of the 
AIA Orientation Test when compared with 
tests of general scholastic ability in predict- 
ing success in accounting study is a matter of 
concern to counselors and to teachers when 
the Orientation Test is used in the College 
Accounting Testing Program. 

A considerable amount of research data has 
been accumulated at the project office relat- 
ing to the reliability and validity of the tests 
used in the College Accounting Testing Pro- 


10. R. Hendrix. Predicting success in elementary 
accounting. J. appl. Psychol., 1953, 37, 15-77. 


gram. Some of these data are the result of 
research carried out at the project office; 
some are results of independent studies car- 
ried out by participating schools and reported 
to the project office. As with any program 
which reaches into many institutions and 
many different kinds of situations, the data 
show a rather wide range of results. In some 
schools, correlations between Orientation Test 
scores and accounting grades have been un- 
usually high, while with other groups in dif- 
ferent schools relationships shown have been 
on the disappointing side. The usual pro- 
cedure in dealing with such an accumulation 
of data is to generalize on the basis of cen- 
tral tendencies of results. This procedure 
has been followed in reporting on the validity 
and the reliability of the instruments used in 
the College Accounting Testing Program. The 
point is that it is usually an unsafe procedure 
to generalize from a single study based on a 
particular group of students. If the results 
obtained with one group are borne out with 
data from similar research based on different 
groups, it may be safe to generalize a finding 
or a trend. 

Most of the comparative validity data 
gathered at the project office has been con- 
cerned with a comparison of the Orientation 
Test and the ACE Psychological Examina- 
tion. This is true because the ACE test is 
the most widely used test of scholastic ability 
at the college level, and hence, most of the 
questions concerning superiority of the Ori- 
entation Test coming from institutions par- 
ticipating in the program related to the ACE 
test which, commonly, was part of the battery 
of tests already used in the college. The data 
from several of these studies are shown in 
Table 1, together with the results reported by 
Hendrix. 

The Table 1 correlations show varying re- 
sults, but only in the Hendrix study does the 
difference between the pair of correlations 
favor the ACE test. The data reported for 


8 Robert Jacobs 


Table 1 


Correlations between Orientation Test Total Score and Grades in Accounting Courses Compared with 
Correlations between ACE Psychological Examination Total Score and Accounting Grades * 


r r 

Orient. vs. ACE vs. 

Source of Data Institution N Grades N Grades 
Project Office Study Drake University 363 39 294 .27 
Project Office Study U. of Louisville Group A 166 43 161 .22 
Project Office Study U. of Louisville Group B 133 38 134 AS 
Project Office Study Wayne University 265 87 99 .28 
Roth Study CCNY 148 .23 148 19. 
Hendrix Study University of Wyoming 95 32 95 36 


* In most instances the ACE Psychological Examination is administered 
year and the Orientation Test at the beginning of the so 


a semester of accounting study. 


the project office studies show differing N’s 
for the comparative correlations. This may 
raise some questions regarding the validity of 
comparisons. The correlations between other 
test scores and accounting grades were ob- 
tained as supplementary studies following the 
checks on relationships between Orientation 
Test scores and grades. Scores on other tests 
were not available for all students taking the 
Orientation Test, with the exception of the 
University of Louisville Group B, but so far 
as is known, no bias occurred in the use of 
the smaller population. The difference in N 
is of most concern in the Drake University 
and Wayne University data, and it will be 
noted that the superiority of the Orientation 
Test is less noticeable in these two instances 
than in the case of the two University of 
Louisville groups where the N’s are in closer 
agreement. Furthermore, the study in which 
the N’s were the same, the one carried on 
by Roth at CCNY (unpublished), shows as 
much difference in favor of the Orientation 
Test as does Hendrix’s study in favor of the 
ACE. 

However, the point of this short note is not 
so much to argue the superiority of the Ori- 


at the beginning of the feina 
phomore year, but before the students have complete 


entation Test as to suggest the danger in gen- 
eralizing the superiority of one testing i” 
strument over another on the basis of a study 
using the results from a single institution. 

The Hendrix study reports the only com 
parison between the Orientation Test and th¢ 
OSU Psychological Test which has come i 
the attention of the project office (OSU tes 
vs. grades = .37; ATA test vs. grades = 32) 
As indicated with the ACE exam data, how 
ever, it would be hazardous to generalize on 
the basis of this one bit of evidence. 

It is believed by this writer that a further 
note of caution could be added to Hendrix 5 
summary to the effect that “It does not nec 
essarily follow that the same relationship 
would be obtained in a different institution 
and with a different group of students.” The 
data shown in Table 1 indicate that results 
do differ from one group to another, and they 
suggest further that the general trend in co™ 
parative validity tends to favor the Orienta” | 


tion Test rather than the ACE Psychologic# 
Examination. 


Received September 24, 1953. 
Published out-of-turn by the editor. 


Tue JOURNAL or APPLIED. PSYCHOLOGY 
Vol. 38, No. 1, 1954 


“A Note” Acknowledged 


O. R. Hendrix 
Office of Student Personnel and Guidance, The University of Wyoming 


In the preceding article, Jacobs has sug- 
gested that a further note of caution could 
be added to the summary of the study re- 
ported by this writer in the April, 1953 issue 
of the Journal of A pplied Psychology. The 
suggested note is: “It does not necessarily 
follow that the same relationship would be 
obtained in a different institution and with a 
different group of students.” 

One could certainly have no objection to 
such a statement. As Jacobs points out, nu- 
merous correlation studies have verified its 
accuracy. This writer most certainly did not 
intend to imply that the results of his limited 
study had general application. In fact, he 
tried to guard against such an assumption by 
stating in his opening paragraph that the 
study reported represented an investigation 
of the relative validity of the several tests 
“for predicting success in elementary account- 
ing at the University of Wyoming” (italics 
added). 

While concurring with Jacobs’ desire to 
guard against generalization on the basis of 
a single study, one should be equally careful 
to keep a number of limitations of Jacobs’ 
own ‘study in mind while considering his 
statement concerning “the relative superiority 
of the AIA Orientation Test when compared 
with tests of general scholastic ability in pre- 
dicting success in accounting study. . . .” 

First, there is the limitation growing out 
of the differences in the number of cases 
used in the computation of the coefficients of 
correlation between grades and Orientation 
Test scores and the number of cases used in 
computing the coefficients of correlation be- 
tween grades and the ACE. In the instance 
of the Wayne University data, 265 cases were 
used for one computation and only 99 for the 
Second computation and in the Drake Uni- 
versity study the number of cases was 363 in 
one instance and 294 in another. While the 
author assumed that no bias occurred in re- 


ducing N, the possibility that bias did occur 
cannot be ruled out. 

A second limitation grows out of the fact 
that “in most instances” the Orientation Test 
was administered a year later than the ACE. 
One might assume that learning which took 
place during this year had no effect on the 
correlation between Orientation Test scores 
and accounting grades. One would also have 
to consider the possibility that the year of 
learning influenced either or both test scores 
and grades and consequently affected the cor- 
relation between the two. If the year of 
learning involved any accounting, one is con- 
fronted with an interesting effort to predict 
aptitude for learning that which has already 
been learned. 

A third limitation is that inherent in mak- 
ing generalizations about “the relative su- 
periority of the AIA Orientation Test when 
compared with tests of general scholastic 
ability . . .” on the basis of studies limited 
to comparison of the Orientation Test and a 
single scholastic ability test, namely the ACE. 

Possibly in an effort to keep his note brief, 
Jacobs has failed to mention the possibilities 
for more accurate prediction through the use 
of a number of predictors rather than a single 
predictor. Well-trained counselors seldom de- 
pend upon a single predictor. An increasing 
number of counseling agencies are construct- 
ing prediction equations based upon multiple 
variables. The question of whether the Ori- 
entation Test contributes significantly to such 
equations still has to be answered. 

It is entirely possible that studies in which 
the above listed limitations are not operative 
would provide proof that the AIA Orientation 
Test is superior to tests of general scholastic 
ability. Until such studies are cited, one 
would seem justified in retaining an open 
mind on the subject. 

Received November 4, 1953. 

Published out-of-turn by the editor. 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 1, 1954 


The Relation of Ninth Grade Test Scores to Twelfth Grade Test 
Scores and High School Rank 


Wilbur L. Layton 


Student Counseling Bureau, University of Minnesota 


The 9th grade is a crucial one for most 
students, for they, their school counselors, 
teachers and administrators must make deci- 
sions which are important for the students’ 
high school careers and in fact for their en- 
tire futures. High school guidance workers 
test many 9th grade students in order to as- 
sist them to select appropriate high school 
curricula. 

This study was an attempt to determine 
the meaning of 9th grade tests as predictors 
of over-all high school achievement and 12th 
grade test scores. 

In January and February of 1949, the 1947 
High School Edition of the ACE Psychologi- 
cal Examination was administered to approxi- 


mately 15,000 ninth grade students in Min- 


nesota through the state-wide high school 
testing program administered by the Student 
Counseling Bureau of the University of Min- 
nesota. The students tested in this program 
were from schools volunteering to participate 
in the program at their own expense. These 
schools consisted of approximately 50 per 
cent of non-metropolitan high schools in Min- 
nesota. Approximately 10,000 ninth graders 
were also given the Cooperative English Test, 
Form Y, Lower Level, Single Booklet Edition, 
Mechanics of Expression, Effectiveness of Ex- 
pression and Reading Comprehension. ‘Three 


Table 1 


N’s, Means and Standard Deviations for 9th Grade Test 
Scores and 12th Grade Test Scores and 
High School Percentile Rank 


Standard 

N Mean Deviation 
9th ACE 2,173 67.9 18.4 
12th ACE 2,185 94.7 24.0 
9th English 690 155.6 64.4 
12th English 2,185 172.7 42.2 
12th HSR 2,185 50.8 28.7 


years later, in the winter of 1952, all the high 
school seniors in the state, including many 0 
the 9th grade students tested in 1949, were 
tested on the 1947 College Edition of the 
ACE Psychological Examination and Coop- 
erative English Test, Form S, Lower Level, 
Mechanics of Expression and Effectiveness 
of Expression. High school percentile ranks 
(HSR) were procured from the high schools 
for these seniors. The HSR was based 0” 
the senior’s scholastic rank in his class at the 
end of three and one-half years of work. 

A sample of 2,185 men and women who had 
been tested as freshmen was pulled from the 
files. Correlations were computed betwee! 
9th grade total ACE raw score, 9th grade 


Table 2 


Coefficients of Correlation between 9th Grade Test Scores and 12th Grade Test Scores and 
High School Percentile Rank * 


ACE Coop. Eng. HSR 
Tests (12th Grade) (12th Grade) (12th Grade) 
ACE (9th Grade) -80(2169) -71(2171) .63 (2173) 
English (9th Grade) -75( 681) .82( 683) -71( 690) 
ACE (12th Grade) -74(2185) -65 (2185) 
English (12th Grade) -74(2185) 
Mee 


* In parentheses following the coefficient is given the number of cases upon which each coefficient is base? 


10 


Relation of Ninth Grade to Twelfth Grade Test Scores 11 


total Cooperative English raw score and 12th 
grade total ACE raw score, 12th grade Co- 
operative English total raw score and HSR. 
Table 1 presents the means and standard 
deviations for each of the variables. 

As Table 2 shows, there was a substantial 
relationship between the 9th grade tests and 
the corresponding tests given in the 12th 
grade and with HSR. High School ACE 


taken in the 9th grade correlated .80 with 
College ACE taken in the 12th grade and .63 
with HSR. These results indicate the extent 
to which the high school counselor can inter- 
pret 9th grade test scores as predicting high 
school achievement and 12th grade test scores 
and can use these predictions to counsel 9th 
grade students. 


Received February 27, 1953. 


Tue JOURNAL OF APPLIED PSYCHOLOGY 
Vol. 38, No. 1, 1954 


Selecting Items for Interest Inventory Keys ' 


Kenneth E. Clark and Helen H. Gee 


University of Minnesota 


The use of vocational interest measures as- 
sumes that workers in a given occupation have 
in common certain likes and dislikes, and that 
these preferences are different from those of 
workers in other occupations. The extent to 
which an individual’s interest patterns match 
those of a group is determined by use of a 
scoring key on an interest inventory. This 
key is developed by using those responses 
which are made more frequently by the spe- 
cific occupational group than by men-in- 
general (scoring these responses “plus”) and 
those responses made less frequently by the 
specific occupational group (scoring these re- 
sponses “minus”). How great the difference 
in response must be in order for a response to 
be scored is a difficult question to answer. 
The difference must be large enough to reduce 
to a negligible amount the number of chance 
differences. Yet the number of responses 
scored must not be so small as to yield a key 
which is too unreliable for use with individ- 
uals. Between these two limits it is still possi- 
ble to develop many different keys possessing 
rather widely varying characteristics. 

This paper summarizes work which has 
been done in trying out various methods for 
the development of scoring keys for the U. S. 
Navy Vocational Interest Inventory. The 
reader will note that this work is strictly em- 
pirical, although the ideas which are tried 
out arise from theoretical work. To the ex- 
tent that interest inventory responses are 
unique in their psychometric characteristics, 
the findings of this report are limited in ap- 
plication. It seems reasonable to assume, 
however, that similar methods of key de- 
velopment would produce similar results when 
applied to such related measures as per- 

1 The research reported herein was carried out un- 
der Contract N6ori-212, T.O. III, NR 151-248, be- 
tween the Office of Naval Research and the Uni- 
versity of Minnesota, and, in part, under a grant 
from the Graduate School of the University of Min- 
nesota. Able assistance in major parts of the work 


reported here was given by Mrs. Carolyn C. White 
and Mr. Norris Ellertson of the project staff. 


sonality inventories, biographical records, and 
the like. 


Samples Used 


Two occupational groups have been used, 
one civilian and one military. The civilian 
group is composed of 189 electricians obtained 
through labor union sources in St. Paul, Min- 
nesota, and, for cross-validation use, 174 elec- 
tricians similarly obtained in Minneapolis- 
Keys were developed by comparing their 1° 
sponses with those of members of other OG: 
cupational groups from St. Paul and Mir 
neapolis. These were: milk wagon drivers: 
painters, plasterers, bakers, sheet metal work- 
ers, printers, warehousemen, plumbers, ™* 
chinists, shipping clerks, pressmen. 

The Navy group is composed of a sample 
of 261 Aviation Machinist’s Mates (AD s) 
obtained through Receiving Stations on t 
east and west coasts, and a sample of 2 
AD’s for cross-validation purposes obtaine 
from the Naval Air Technical Training Co™ 
mand at Memphis. The Navy men-in-ge” 
eral sample used to determine the amount ° 
overlap obtained for various keys is a sample 
of 200 men drawn randomly from a sample 
of 1,000 Navy rated men who had been draw” 
from the total Receiving Station sample in 
such a way as to reflect the distribution y 
rates in the Navy as a whole. The entifé 


sample of 1,000 was used to obtain thè 


percentages of responses of men-in-genet 
needed in the development of keys. 


Criteria of a “Good” Key 


For purposes of this study, a scoring key 
is considered good if it does a good job ° 
separating workers in a given occupation from 
workers-in-general. Thus, a key for Gunner A 
Mates would perform its function well if the 
distribution of scores of GM’s was marked 
different from a distribution of scores of me? 
in another rate, or of men in a variety of dif- 
ferent rates. In the following pages, u 
index of separation of such distributio” 


Selecting Items for Interest Inventory Keys 13 


which shall be used is “percentage overlap.” 
This index gives the number of persons per 
hundred in one distribution whose scores can 
be matched by scores in the other distribu- 
tion. Perfect separation occurs when the 
highest score in one distribution is lower than 
the lowest score in the other; in this instance 
the percentage overlap is zero.. No separation 
at all can be made if the two distributions are 
identical. When this occurs the percentage 
of overlap is 100.? 

A second criterion used in the evaluation of 
a scoring key is its reliability. In this report 
reliability is reported as test-retest reliability, 
obtained by scoring the interest inventories of 
90 men students at Dunwoody Industrial In- 
stitute, Minneapolis, who took the inventory 
twice, with an interval of about one month be- 
tween administrations. 

A different sort of criterion which may be 
used to evaluate methods of scoring the in- 
terest inventory would be the relative success 
of various keys for the prediction of school 
success or of advancement in rate, or the pre- 
diction of re-enlistment, or the prediction of 
military failure as evidenced by records of 
disciplinary action or less than honorable dis- 
charge. These methods of evaluation are 
obviously more pertinent for the application 
of interest inventory scores, but require the 
passage of a considerable period of time after 
administration of the inventory, to permit the 
individual to have a chance to achieve or fail 
to achieve. Accordingly, these criteria are 
not used in this report. One might expect 
that keys which do a good job of separating 
groups would prove to be the same sorts of 
keys that would prove useful for these other 
purposes, but there is, as yet, insufficient evi- 
dence to warrant this expectation in the mili- 
tary service. Data have been collected, how- 
ever, which will give evidence on this point 
after a sufficient interval of time has elapsed. 

In any development of scoring keys based 
upon empirical methods, there is always the 
possibility that differences between groups 


* This is the index of overlap suggested by_ Tilton 
(Tilton, J. W, The neretai of overlapping: J. 
educ. Psychology, 1937, 28, 656-662). Tilton’s ar- 
ticle provides tables which may be entered using the 
difference in means for the two distributions divided 
by the average of their standard errors. For other 
characteristics of this index, see Tilton’s article. 


used to select responses for scoring are chance 
differences which, upon cross-validation, will 
tend to disappear. Accordingly, for each of 
the keys developed and reported upon in this 
report, a cross-validation sample has been 
used to determine the amount of regression 
to be expected. In addition, differences gen- 
erally have been required for scoring which 
are large enough to be well beyond the limits 
within which chance factors would be ex- 
pected to operate; this method of operating 
seemed desirable since each key is made up 
of only a small number of items selected from 
a total pool of 1140 item responses. 


Optimal Number of Items in a Scoring Key 


Finding no adequate rationale for deter- 
mining a priori the number of item responses 
to score in developing an occupational key for 
the vocational interest inventory, attempts 
were made to make this determination em- 
pirically. This work was started with the 
hope that scoring could be done with less 
effort than is required with the Strong Voca- 
tional Interest Blank, which does a good job 
of separating out occupational groups, but at 
the expense of a weighting of many item 
responses to get a score. (Strong assigns 
weights varying from plus four to minus four 
to as many as five or six hundred of the twelve 
hundred possible responses to his blank.) 

The first work done to determine how best 
to develop a scoring key was done with the 
civilian electrician sample. A series of scoring 
keys was developed on the basis of the differ- 
ence in responses of the electrician and other 
skilled trades groups, as follows: a 6% key 
was developed by using all item responses 
with differences in percentage responses of 
electricians and tradesmen-in-general of six 
per cent or more. In like manner, a 7% key, 
an 8% key, a 9% key, and so on, were de- 
veloped. The series was stopped at a 26% 
key, when only 21 items remained for scoring. 

The comparative merits of each one of 
these keys may be inferred from the data 
presented in Table 1. These data indicate 
the existence of an optimal point in key de- 
velopment, since greatest separation occurs 
neither at the end of the scale with the small- 
est number of items, nor at the end of the 


14 


Table 1 
Comparison of Various Electrician Scoring Keys in 


Terms of Overlap and Several Esti- 
mates of Reliability 


Per Cent Overlap 


Test- 

No. of Cross- Retest 

Items Original Validation Relia- 

Key in Key N=189 N=174 bility 
6% 580 51% 50% 84 
1% 493 49 52 83 
8% 402 47 49 81 
9% 345 48 48 .80 
10% 289 44 47 81 
11% 234 46 ht 80 
12% 201 46 44 80 
13% 171 44 42 79 
14% 140 44 37 78 
15% 116 41 39 78 
16% 103 41 39 78 
171% 87 39 36 M 
18% 72 40 35 719 
19% 62 38 31 .79 
20% 55 37 29 80 
21% 44 37 27 80 
22% 43 37 26 81 
23% 40 39 31 81 
24% 30 42 31 78 
25% 24 48 34 17 
26% 21 50 34 78 


scale with the largest number of items. Keys 
with smaller numbers of items, in general, are 
to be preferred. It seems safe to conclude 
that, as one starts with a small number of 
items, the addition of more items increases 
the differentiating power of the key only so 
long as these items contribute more unique- 
ness than error; as error increases, the stand- 
ard deviations of both the criterion and men- 
in-general groups increase enough to offset the 

` additional increase in mean difference con- 
tributed by these items. 

With a small number of items, however, 
some attention needs to be given to problems 
of reliability. When the only estimates of 
reliability that were available were made by 
other than test-retest means, this problem 
seemed serious enough to warrant the sacri- 
fice of considerable validity in order to 
achieve minimum reliability. As Table 1 in- 
dicates, however, very little is lost in the way 
of test-retest reliability, by a radical reduc- 
tion in the number of items scored. 


Kenneth E. Clark and Helen H. Gee 


A check on the degree to which this gen- 
eralization about number of items in the key 
affecting the validity of the key has been 
made as part of another study using the 
Strong Vocational Interest Blank. In that 
study the best key was the one with the small- 
est number of items scored (24 items). How- 
ever, these were responses of psychologists; 
with no sample of answer sheets for men-in- 
general, so that a different measure of good- 
ness-of-key than percentage overlap was used. 
No evidence on test-retest reliability of this 
set of keys was obtained. Even so, it woul 
seem that unit weighting of a fairly small 
number of items is warranted for scoring ° 
vocational inventory responses. 


Effects of Weighting 


While the data on which the decision t° 
use unit weights was based are fragmentary» 


they indicate clearly that superior separation : 


of groups can be attained by use of such un! 
weights. Thus, per cent overlap betwee? 
electricians and tradesmen-in-general was 
37% with the best unit-weights key, and W35 
53% with a key weighted according to the 
formula used by Strong. The same figure 
for printers were 40% and 57%, respectively’ 
Scoring the Strong blank, using best unit 
weights key, placed men-in-general 3.71 stand- 
ard deviations below the mean for psycholo- 
gists in the original sample, and 4.03 standat 
deviations below the mean for psychologists n 
the cross-validation sample. Using Strong ; 
method of weighting, men-in-general fell 3-2 
standard deviations below the mean for PSY” 
chologists.* i 
These comparisons do not, of course, 1” 
dicate that weighting would not improve 
separation of groups. In fact, the entit? 
literature on multiple regression would sug 
gest otherwise. What they do indicate is tha! 


Selecting Items for Interest Inventory Keys 15 


a simpler scoring system can separate groups 
as well as does the more involved method 
used with the Strong Vocational Interest 
Blank. In the interest of economy of scoring, 
it thus seems profitable to use unit weights 
until such time as a real superiority of mul- 
tiple weights is demonstrated. 


Heterogeneity of Content of Keys 


The selection of item responses for scoring 
solely on the basis of the percentage differ- 
ence in response of a reference group and a 
criterion group will tend, presumably, to give 
an over-representation of items reflecting cer- 
tain aspects of the interests of the criterion 
group, and under-representation of other 
aspects. Thus, in developing a key for elec- 
tricians, it might well be that 30 responses 
indicating a man liked to splice wires, repair 
circuits, and the like, might be scored, whereas 
only one response indicating that a man 
wanted to study in the area of mathematics, 
electrical engineering, and physics might be 
scored. Yet both of these kinds of responses 
are characteristic of the responses of elec- 
tricians, 

In a sense the use of weights might be con- 
sidered as an attack on this problem. Most 


‘weighting is done, however, on the basis of the 


magnitude of the difference between men-in- 
general and the specific group, rather than on 
the basis of the amount of the factor already 
measured by other item responses. To devise 
an economical procedure for computing such 
weights directly would be a genuine contribu- 
tion. This project has not done so. In the 
absence of such a procedure, approximation 
methods must be employed. 

The first method employed in this study to 
improve the composition of a scoring key was 
an attempt to avoid including in a key too 
large a number of items reflecting the central 
core of interests of an occupational group. 
An iterative method of item selection was 
therefore employed. First, the best ten items 
were selected; these were the items on which 
the responses of the criterion group differed 
most from the responses of the reference 
group. All members of the criterion group 


were then scored for their responses to these , 


items. Another ten items were then selected; 
for each of these the difference in responses 


between reference and criterion groups was 
still large, and the correlation with the com- 
posite of the first ten items was negligible. 
Another set of ten valid items (i.e., differen- 
tiating between criterion and reference groups) 
which did not correlate with these first twenty 
was then selected. Finally, ten more valid 
items unrelated to the first thirty were se- 
lected. This key is therefore a fairly hetero- 
geneous key which omits a rather large num- 
ber of items even though they differentiate 
members of the occupational group from 
tradesmen-in-general. 

The first groups on which this type of key 
was tried were civilian electricians. The elec- 
trician key which had been developed by 
simpler means, taking all item responses with 
a given percentage difference for the criterion 
and reference group, was already a satisfac- 
tory key. The percentage overlap of distribu- 
tions of scores of electricians and tradesmen- 
in-general was only 35% in the original group, 
and 41% in the cross-validation group. 

Even so, the use of the iterative method for 
selecting items for scoring in a key reduced 
overlap to 30% in the original sample, and 
to 35% in the cross-validation sample. And 
this is done without any real drop in the re- 
liability of the key, even though only 40 item 
responses are scored. 

The same comparison of an original key 
(developed by using all items showing a given 
minimum difference between criterion and 
reference groups) and a key developed by 
iterative methods was made using samples 
of Aviation Machinist’s Mates (AD’s) ob- 
tained from Navy sources. The AD key de- 
veloped by original methods is not a very 
good key in terms of its separation of AD’s 
from Navy men-in-general, since the overlap 
of these two groups is relatively high—65% 
for the original group, and 58% for the cross- 
validation group. Its reliability is, however, 
rather good. On the other hand, the key de- 
veloped by iterative methods is a distinctly 
better key than that developed by original 
methods when one looks at the overlap be- 
tween groups, but has a reliability of only .74, 
These findings are in’ accord with those ob- 
tained with civilian electricians, except that 
differences are greater between different keys. 
(The reader should not generalize from these 


16 


Kenneth E. Clark and Helen H. Gee 


Table 2 
Summary of the Characteristics of Various Scoring Methods Applied to Three Criterion Samples 


Per Cent Overlap 


No. of Cross- Test-Retest 

Group Type of Key Items Original Validation Reliability 
Electricians Original 77 35% 41% 88 
Iterative 40 30 35 86 
Gulliksen 63 28 31 86 
Rec. Sta. AD’s Original 83 65% 58% 85 
Iterative 42 51 51 74 
Gulliksen 49 56 51 JS 


two groups and assume that Navy groups are 
consistently harder to separate—the AD 
group was selected because it is a group that 
gives relatively poor separation from other 
Navy groups, and hence provides a severe 
test of the value of the various methods tried 
with a better-separation group.) 

In hopes of developing still better keys at 
less cost for computation, another type of key 
was tried. The method of developing this 
key requires selection of a fairly sizable pool 
of items, perhaps 100 or more, by taking those 
items with high validities, and then eliminat- 
ing those with high indices of internal con- 
sistency and only moderate validity. This 
type of key has been labeled, for want of a 
better title, the “Gulliksen Key,” since the 
steps taken are similar to those proposed by 
Gulliksen.* Specifically, a key is developed 
by selecting all items for which the criterion 
group response differs from that of the refer- 
ence group by a given amount or more (gen- 
erally, 12 to 15 percentage points). A large 
(1,000 for Navy, 550 for civilian groups) 
men-in-general sample is then scored using 
this key. The top and bottom 27% of this 
distribution is used to obtain an estimate of 
the reliability of each item; the difference in 
responses of the criterion and reference groups 
is used as an estimate of the validity of the 
item.” These two values are then plotted 
against each other much in the manner de- 


4 Gulliksen, H. Theory of mental tests. New 
York: John Wiley & Sons, Inc., 1950. See especially 


382-385, itia hee 
he reliability and validity indices when ex- 


pressed in correlation terms have yielded, in this 

work, keys with almost identical characteristics as 
3 . 

those obtained using percentage differences, 


scribed by Gulliksen (op. cit., p. 384) and 
items selected much as he recommends. The 
general effect of the method is to give prefer- 
ence to items which have good validity and 
which do not correlate highly with other items 
in the pool. 

It should be noted that this method is an- 
other approximation method, and is designed 
to select items having somewhat the same 
characteristics as the items selected by the 
iterative method. The Gulliksen method as 
here used is somewhat easief to employ, i5 
more readily adapted to I.B.M. methods, an 
hence is more practical than the iterative 
method. It should also be noted that the 
values used as estimates of reliability and 
validity of items differ from those outlined 
in Gulliksen, since in this analysis gross per 
centage differences are used in estimating 
these item characteristics, acd 

The comparison of overlaps and reliabili- 
ties of all of these new keys with the original 
keys developed for electricians and the Navy 
AD group is summarized in Table 2, In both 
instances, the Gulliksen key is distinctly su- 
Perior to the original key in terms of overlap 
and is perhaps better than the iterative key- 
The superiority of both methods over the 
original key is retained in the cross-validation 
samples as well. In both the electrician and 
the Navy AD samples this gain seems larg 
enough to warrant the use of the new key i? 
Spite of the fact that this key has a lowe! 
reliability than the original key. 

As noted above, a best unit-weights key 


‘for psychologists using the Strong blank re 


sulted in superior separation of psychologists 


Selecting Items for Interest Inventory Keys 17 


from men-in-general as compared with a key 
weighted according to the formula used by 
Strong. The Gulliksen method described 
above was also applied to the Strong data 
but with a slight modification. Since no 
sample of answer sheets for men-in-general 
was available it was necessary to base esti- 
mates of item reliability on the criterion 
group. The top and bottom 27% of a sub- 
sample of 604 psychologists were accordingly 
used. A 95-item key resulted from applica- 
tion of the Gulliksen method which differed 
very little in its effectiveness from the best 
unit-weights key previously mentioned. Using 
the Gulliksen key placed men-in-general 3.78 
standard deviations below the mean for psy- 
chologists as compared with 3.71 for the best 
unit-weights key. On cross-validation the 
comparable figures were 3.79 for the Gullik- 
sen and 4.03 for the best unit-weights key. 
It is to be noted, however, that this best unit- 
weights key contained only 24 items, and 
while test-retest reliability is not available, 
it is doubtful if it would be found to be ade- 
quate. This key included all items on which 
psychologists and men-in-general differed by 
33% or more in their responses. The Gullik- 
sen key used items with as low as 18% differ- 
ence. Ona best unit-weights key of 91 items 
(including all items with 24% or greater dif- 
ference between psychologists and men-in- 
general), and in the sense of number of items 
more nearly comparable to the Gulliksen key, 
the men-in-general means were 3.26 and 3.42 
standard deviations below the means for psy- 
‘chologists on test and cross-validation groups. 
The implication is clear that item for item, 
the Gulliksen key results in superior separa- 
tion, but interpretation must be cautious since 
information on reliabilities of these keys is 


not available. 


Summary 


The development of a method of scoring 
responses to an interest inventory so as to 
maximize the separation of workers in an 
occupation from workers in general involves 
consideration of many factors. Taking a cue 
from applications of multiple regression tech- 
niques, we would expect that a point would 


be reached when the addition of more items 
in a scoring key would not be profitable; that, 
in general, the greater the heterogeneity of 
item content, the more effective would be the 
key; and that the use of weighting methods 
properly applied would increase the degree of 
separation of groups. Using as criteria of a 
good key its ability to separate groups (as 
measured by per cent of overlap of distribu- 
tions) and its test-retest reliability, it is 
theoretically possible to demonstrate the im- 
portance of each of these points. From a 
practical standpoint, however, one must de- 
termine whether or not approximation meth- 
ods are usable, and, if so, to what extent these 
various factors need to be considered when 
employing these approximation methods. 

This report summarizes various methods 
of developing keys, and provides support for 
the following statements: 


1. When items are scored using unit 
weights, an optimum number of items can be 
found for scoring. For the samples used 
herein, this number seems to be between 40 
and 60; when either more or fewer items are 
scored, the discriminating power of the key 
is reduced. 

2. When item responses are weighted in the 
manner used by Strong in his Vocational In- 
terest Blank, the criterion group is not sepa- 
rated from the reference group as well as 
when unit scores using the optimum number 
of items are used. (This is not to say that 
some weighting system could not be devised 
which would be superior to unit scoring— 
obviously such a set of weights could be as- 
signed as to yield a score superior to any score 
by using multiple regression techniques. 
What this does say is that the method of 
weighting used by Strong is not superior to 
the method of unit weights.) 

3. When items are selected so as to in- 
crease the heterogeneity of content of a 
scoring key, the validity of that key is in- 
creased, and the test-retest reliability is some- 
what decreased. This is true whether items 
for such a key are selected by an iterative 
method as described in this report, or by an 
internal item analysis method. 


Received March 23, 1953. 


THE JOURNAL or APPLIED PsycHoLocy 
Vol. 38, No. 1, 1954 


Practice Effects on the Minnesota Vocational Test for 
Clerical Workers * 


Howard P. Longstaff 


University of Minnesota 


It is possible that the popularity of a re- 
liable and valid psychological test may be- 
come a weakness of that test at least in a 
given locality. There is some indication that 
such is the case with the Minnesota Voca- 
tional Test for Clerical Workers. In the se- 
lection of clerical workers this test has been 
a valuable aid to many business firms in Min- 
neapolis-St. Paul, Minnesota and elsewhere. 
Its extensive use in the above mentioned Min- 
nesota cities has resulted in many job appli- 
cants having taken the test several times. 
If the test is subject to “practice effects,” 
then the scores made by applicants who have 
taken it more than once become of question- 
able value. 

Since the Minnesota Vocational Test for 
Clerical Workers has been so widely used, 
the question has been raised as to what effect 
practice has upon the scores. Earlier studies 
indicated a normal practice effect of from 7 
to 12 per cent after time intervals of from 
three to six months (1). This may not seem 
a prohibitive effect for the time intervals in- 
volved, but in actual employment practice 
much shorter intervals of time are probably 
the rule. An applicant may apply for a job 
with several different companies within a mat- 
ter of hours or days. 

The purpose of this study was to measure 
practice effect on this test over relatively 
short intervals of time. Two groups of Uni- 
versity of Minnesota students in personnel 
psychology courses served as subjects. Group 
A was made up of 61 juniors, seniors and 
graduate students (41 men and 20 women). 
Group B was comprised of 36 Extension Di- 
vision students (24 men and 12 women) in 
an evening class. Group A was given the 
test successively on a Wednesday, Friday, 
and Monday, October 1, 3, and 6, 1952. 


is research was made possible by a grant-in- 
H Pol the Graduate School of the University of 
a 


Minnesota. 


Group B, which met only once a week, was 
given the test on three successive Monday 
evenings September 29, October 6 and 13, 
1952. The purpose of the study was ex- 
plained to both groups and they were en- 
couraged to make as much improvement as 
possible. Since the number of subjects in 
each group was small and the time intervals 
between testing were not great, results for the 
day and night groups are combined. 

Table 1 presents the results of the com- 
bined groups. It is apparent that consider- 
able practice effects occur. All of the differ- 
ences between the means are significant at 
the .1 per cent level. When considered from 
the standpoint of what these differences mea? 
in terms of centile ranks we observe that the 
mean scores on the original testing woul 
have had centile ranks on norms for em- 
ployed clerical workers below 50 while the 
centile ranks of the mean performance whe? 
the test was taken the third time, range fro™ 
72 to 91 on the same norms. r 

A different type of analysis, presented 1» 
Table 2, shows much the same thing as the 
data in Table 1. On trial 3 from 91 to 97 
per cent of the subjects reach or exceed the 
mean score made on trial 1, indicating marke 
improvement. 

As has been shown elsewhere (1, 2, 3, 4, 
5) there is a decided sex difference in pet 
formance on this test. The subjects in this 
study behave similarly, as shown in Table 3- 
Comparing the results of men and women 0” 
trial 1 it is apparent that the women até 
superior to men: It is also obvious that this 
difference is consistent on successive trials 0? 
the test, i.e., comparing trial 1 for men wit 
trial 1 for women, trial 2 for men with trial 
for women, and trial 3 for men with trial 
for women. When successive trials for me? 
are compared with the original trial f0 
women the practice effect rather rapidly ove 
comes the original differences, and by trial 4 


Practice Effects on Test for Clerical Workers 19 


Table 1 


Means, Standard Deviations, Differences between Means, t’s, P’s, r’s and Centile Rank the Means 
Would Have on Norms for Employed Clerical Workers 


Part A. Combined Male Groups, N = 65 


Numbers Names 
Trials Trials 
1 2 3 1 2 3 
M 118.6 142.5 152.5 124.0 154.7 167.3 
S 26.7 25.9 25.4 29.3 24.4 23.6 
Dr = 23.9 Di= 33.9 Di 100 Dis= 30.7 Dis= 43.3 Des = 12.6 
t 12.0 14.7 Ta 18.1 15.5 7.9 
P -001 001 001 -001 -001 -001 
re = 81 nis = 74 r3 = 91 r= 89 ns = -67 ra = 85 
Centile 
rank of 
mean scores 27 62 72 47 81 91 
Part B. Combined Female Groups, N = 32 
Numbers Names 
Trials Trials 
1 2 3 il 2 3 
M 137.0 157.9 170.2 142.7 170.8 178.8 
S 28.4 26.3 22.7 30.1 25.7 21.7 
Di= 209  Dis= 33.2 D= 12:3 De= 281 Ds= 361 Ds= 80 
t 8.0 11.5 4.9 14.1 10.3 4.0 
P .001 -001 .001 001 001 001 
ro = 86 ng = 82 ra = 84 re = 93 ns = 17 fn = 91 
Centile 
rank of 
mean scores 40 > 70 83 36 68 81 
77 per cent (Numbers) and 80 per cent Discussion 


(Names) of men scored as high as did the 
women on trial 1. This is additional evi- 
dence of the seriousness of the practice effect 
on this test. 


The practice effect found on the Minne- 
sota Vocational Test for Clerical Workers 
can be explained in part by the nature of the 
test itself. First, the changed digits in the 


Table 2 


Percentage of Women and Men (Combined Day and Extension Groups) Who Reach or Exceed the Mean 
on the First Trial, or the Second Trial or Subsequent Trials * 


Numbers Names 
Per Cent Per Cent Per Cent Per Cent 
of Women of Men of Women of Men 
80 85 87 89 Trial 2 vs. trial 1. 
94 91 97 97 Trial 3 vs. trial 1. 
80 66 69 69 Trial 3 vs. trial 2. 


* Line one of Table 2 shows percentage reaching or exceeding on trial 2 their own mean on trial 1. Line two 
shows same data for trial 3 compared to trial 1. Line three shows same data for trial 3 compared with trial 2, 


20 Howard P. 


Table 3 


Percentage of Men (Combined Day and Extension 
Groups) Who Reach or Exceed the Mean of the 
Women (Combined Day and Extension Groups) 
on the Various Trials of Taking the Test * 


Numbers Names 

Per Cent Per Cent 

of Men of Men 
32 30 Trial 1 vs. trial 1. 
55 60 Trial 2 vs. trial 1. 
77 80 Trial 3 vs. trial 1. 
30 30 Trial 2 vs. trial 2. 
40 50 Trial 3 vs. trial 2. 
30 40 Trial 3 vs. trial 3. 


* Line one of the table shows the percentage of men 
who on trial one reach or exceed the mean of women on 
trial one. Line two indicates the percentage of men 
who on trial two reach or exceed the mean of women on 
trial one. And line three gives the percentage of men 
who on trial three reach or exceed the mean of women 
on trial one. The remaining lines of the table present 
similar comparisons for trials two with two, trial three 
with trial two, and trial three with trial three. 


number test all occur near the end of the 
second series of digits. If one catches on to 
this one can materially improve one’s score. 
Secondly, the items that are changed or are 
not changed tend to fall into patterns which 
may help a subject with good visual- imagery 
on repeated trials on the test. Thirdly, on 
the names part of the test memory may be 
an important factor in bringing about im- 
provement on successive trials, because the 
subjects may be able to remember a fairly 
large number of the name pairs that are 
changed. 

Practice effect is not a new phenomenon; it 
has been found on other psychological tests 
besides the Minnesota Vocational Test for 
Clerical Workers. Wherever found, it is 
likely to be a weakness in any test that is to 
be used in selecting employees. Especially is 
this true when it is difficult or nearly impos- 
sible to determine how many times a job ap- 
plicant has taken the test previously. If one 


Longstaff 


could determine accurately how many times 
a subject had taken the test and how long a 
time interval had transpired between test- 
ings, correction factors could be worked out. 
But in the everyday world of employee selec- 
tion and placement, no reliable method of se- 
curing such information exists. Therefore, 
other ways of overcoming practice effects 
must be provided if a test subject to practice 
effect is to have maximum value. The use of 
alternate forms is one way to reduce this 
weakness in a test. 


Summary 


1. When the Minnesota Vocational Test 
for Clerical Workers is taken successively 
with short time intervals intervening, marked 
practice effects occur. 

2. With equal amounts of practice the sex 
difference on test performance remains about 
constant but with three practice trials men 
can practically equal the original perform- 
ance by women. 

3. Alternate forms of the test may over- 
come these weaknesses in the test. 


Received March 23, 1953. 


References 


1. Andrews, Dorothy M. and Paterson, D. G. 
Manual, Minnesota Vocational Test for Cleri- 
cal Workers. New York: The Psychological! 
Corporation, 1946. Pp. 5. 

. Loevinger, Jane. An analysis of verbal and nu- 
merical abilities at the junior high school level. 
Unpublished Master’s thesis, University of 
Minnesota, 1938. 

. Schneidler, Gwendolyn G. Grade and age norms 
for the Minnesota Vocational Test for Clerical 
Workers. Educ. psychol. Meas., 1941, 1, 143- 
156, 

4. Schneidler, Gwendolyn G. and Paterson, D. G- 
Sex differences in clerical aptitude. J. educ- 
Psychol., 1942, 33, 303-309. 

- Thatcher, Meriam. Sex differences in clerical 
ability at the fifth and sixth grade levels. 
Unpublished paper, Department of Psychol- 
ogy, University of Minnesota, 1940. Pp. 1-12. 


N 


w 


n 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 1, 1954 


Attitudes Toward Authority-Figures as Correlates of Motivation 


Among Naval Aviation Cadets *° 


E. P. Hollander and John T. Bair 


U. S. Naval School of Aviation Medicine, Pensacola, Florida 


The interrelationship between attitudes and 
motivation has already been noted by a num- 
ber of observers (2, 6, 8, 10). In recent 
years, evidence of the pertinence of the atti- 
tude construct to behavioral criteria has been 
demonstrated best perhaps in the two-volume 
series entitled The American Soldier (9). 
Although much of the research reported in 
these volumes was concerned with service- 
induced attitudes, large segments of the work 
dealt with persisting attitudes derived from 
the serviceman’s reference groups external to 
the service. As a generalization, it might be 
said that in these studies attitudes were found 
to be functionally significant in determining 
the individual soldier’s orientation to military 
life and, accordingly, to his motivation (9, 
pp. 122-130). 


Problem 


This study set forth to determine whether 
certain attitudes which a Naval Aviation 
Cadet brings with him to the training pro- 
gram bear a relationship to his level of moti- 
vation in training. It is apparent, of course, 
that attitudes may be ordered in a hierarchy 
relative to their significance to this particular 
training situation. That is to say, one would 
hardly consider that just any attitudes would 
have significant relevance to motivation in 
this setting; on the other hand, it is apparent 
that attitudes toward study or discipline or 
flying may be of the utmost relevance. In 
evaluative fashion, then, one might arrive at 
a grouping of attitudes which are presumed 
to be of significance in relationship to the 
motivation of cadets in training. With this 


! Opinions or conclusions contained in this report 
are those of the authors. They are not to be con- 
strued as necessarily reflecting the view or the en- 
dorsement of the Navy Department. 7 

* The authors wish to acknowledge their indebted- 
ness to Dr. Brant Clark for his valuable assistance 
in the formulation of this report and to Dr. Richard 
Trumbull, Miss Marjorie Nicholson, and Mr. Calvin 
Nelson who acted as independent-ceders of the data. 

a bp 


21 


in mind, it was considered that the area of 
interpersonal attitudes would provide a fruit- 
ful area for study. In particular, it was de- 
cided that attitudes toward authority-figures, 
in this case officer-instructors (flight and 
ground school), would be an appropriate be- 
ginning. The intent of the study was to de- 
rive implications for further investigations as 
well as to determine possible applications to 
selection. The basic hypothesis asserted for 
test was as follows: that attitudes toward 
authority-figures would significantly differen- 
tiate between cadets of “high” and “low” 
motivation. 


Procedure 


The measurement of attitudes, like the 
measurement of all psychological variables, 
offers challenging and oftentimes unique prob- 
lems. This is especially so where the attitude 
under scrutiny is both structurally complex 
and emotionally laden, in this case attitudes 
toward authority-figures. It soon became ap- 
parent that the traditional attitude scale was 
inadequate and inappropriate to the meas- 
urement of an attitude such as this. As a 
consequence, this technique was discarded in 
favor of the more flexible open-ended projec- 
tive questionnaire (7). 


The usefulness of this method of attitude- 
elicitation rests on the fact that it presents the 
individual with a relatively unstructured stimulus 
situation in which he may, with equanimity, and 
without being consciously aware of the process, 
bring forth feelings that might normally be re- 
pressed through social pressures and other forces. 
Thus, by the employment of this technique, the 
cadet who felt resentment toward an instructor 
might vent his feelings without fear of retribu- 
tion or guilt. The advantage of such a pro- 
cedure in a military setting is obvious. 

In its final form the questionnaire resembled 
superficially the form developed by Flanagan in 
his studies of “critical incidents” among Air 
Force personnel (3). That this was merely a 
resemblance should be re-emphasized, lest an 
erroneous impression be conveyed. The main 


22 E. P. Hollander and J. T. Bair 


intent of the investigation was to procure infor- 
mation about instructors only insofar as this in- 
formation revealed the attitudes of the cadet 
group under study. The format of the question- 
naire was essentially simple. It was presented 
to the subjects under conditions of anonymity 
with the inference that only information was be- 
ing solicited. In addition, subjects were spe- 
cifically asked not to divulge the names of the 
individuals about whom they were to write. The 
cover sheet of this questionnaire contained these 
instructions: “On each of the following pages 
you will be asked to write briefly about a person 
you have known while in the Naval Air Training 
Program. The instructions indicate that you are 
to relate just one incident which typified the atti- 
tudes and behavior which have led you to make 
a positive or negative judgment about this per- 
son. The incident, however, does not have to 
be the only one of its kind, nor must it have 
been the main basis for your evaluation of this 
person.” 

On the top of page one the following further 
instructions were given: “Think of the best in- 
structor you had during Pre-Flight or Flight 
Training. Give just one incident which typified 
the kind of attitudes and behavior which made 
you feel that he was the best. What were the 
specific details of his behavior in that particular 
situation?” 

On the top of page two these instructions were 
given: “Now think of the worst instructor you 
had during Pre-Flight or Flight Training. Here 
again, give just one incident which typified the 
kind of attitudes and behavior which made you 
feel that he was the worst. What were the spe- 


cific details of his behavior in that particular 
situation?” 


Methodologically, two points deserve clari- 
fication: first, the “best”’-““worst” dichotomy 
was utilized in an effort to secure a degree of 
polarization of response which would readily 
yield to differential analysis; second, the in- 
strument was administered under rigorous 
conditions of anonymity so as to minimize 
any implied threat. 

For purposes of this investigation, motiva- 
tion was defined operationally. Cadets of 
“high” motivation were considered to be 
those who had successfully completed the 
basic flight stage of the Naval Air Training 
Program.? Cadets of “low” motivation were 


3 The program is divided into three major phases: 
Pre-Flight, Basic Flight, and Advanced Flight. In 
virtually all cases, cadets who have completed Basic 
have been in training for one year or more. By this 
time attrition is minimal and the likelihood of suc- 
cess is very high, 


those who voluntarily withdrew from the pro- 
gram during this stage. 

During a three months period in the fall 
of 1951, the questionnaire was administered 
to a total sample of 137 cadets classified as 
follows: 72 cadets who were leaving training 
at their own request (the “low” motivation 
group) and 65 cadets who had successfully 
completed basic flight training (the “high” 
motivation group). In both instances ad- 
ministration of the questionnaire was part of 
a routine check-out procedure and was usu- 
ally carried on with small groups numbering 
five or less. 

A summary comparison of the two cri- 
terion groups will be found in Table 1. With 
respect to age and active duty time before 
entering training they were quite compar- 
able. On the whole, however, cadets drop- 
ping at their own request tended to have 4 
significantly greater amount of formal educa- 
tion prior to training. This latter finding 
corroborates, in part, certain of the results 
growing out of a previous report from this 
command (1). 

Following the administration procedures) 
responses to the questionnaire form were ab- 
stracted so as to yield only core phraseology 
relevant to the instructor’s behavior and the 
cadet’s reaction to this behavior. These ab- 
stracts were thereupon transcribed on 3 X 
cards and assigned code numbers at random 
so as to eliminate insofar as was possible 
subjective bias in the content analysis pt 
cedure which followed. Thus, at no time 
during the categorization of this data did the 
judges know the disposition of the cadet 
whose response was in hand. 

As a next step, all of the responses to the 
two instructional “sets,” that is, “best” and 
“worst” instructor, were sifted to secure de 
scriptive elements of behavior. From these: 
a number of categories of behavior were de- 
veloped, which subsumed behavioral elements 
of similar quality and as much as possible 
used the language of the respondents rathet 


than that of the investigators. In every inv — 


stance, these categories were developed inde- 
pendently of one another in terms of a” 
either-or criterion. That is, either the Þe- 
havior was described in the response oF it 


Attitudes Toward Authority-Figures 23 


Table 1 


Summary Comparison of Motivation Criterion Groups with Regard to Age, Previous Education, 
and Previous Military Service 


Previous Active 


Age Education* Military Duty 
(Years) (College Semesters) (Months) 
Group Mean S.D. Mean SD. Mean S.D. 
High Motivation 
(Successful) N = 65 22.0 1.5 3.3 2.6 21.8 13.2 
Low Motivation 
(Withdrawal) N = 72 22.2 1.4 5.5 25 20.6 14.0 


* The t test of significance for the difference between the means for the college education variable was found 
to be 4.89. This is significant at the 1% level of confidence. 


was not. Thus, overlap was possible and, 
indeed, very frequently took place—but only 
as a result of the respondent’s having men- 
tioned more than one major behavioral ele- 
ment. 

As a check on reliability of judgment, 
three independent coders were asked to dis- 
criminate one category of behavior, for each 
of the two instructional sets, within the total 
population of responses to that set. Percent- 
ages of agreement with the principal investi- 
gators and the three independent coders were 
computed for the response categories selected 
for the reliability check. All were found to 
reach an acceptable level.* 


Results 


Frequency of response under each major 
category for the two cadet groups was sub- 
jected to a chi-square analysis. Table 2 pre- 
sents the findings of this procedure comparing 
the major categories of “best” and “worst” 
instructor behavior for the successful and 
withdrawal cadet responses. In general, Ta- 
ble 2 reveals that the cadets of high motiva- 
tion tend to manifest attitudes toward the 
interpersonal quality of instructor behavior 
while those of low motivation, on the other 
hand, tend to show attitudes directed at the 
instructor’s success or failure in his role as a 
teacher. Close scrutiny of this table indi- 
cates that under the “best” instructor set, 


* Two response categories were checked. The per- 
centages of agreement for the response category 
patience were .96, 87, and .88 for each of the inde- 
pendent coders; for ‘verbal assault the percentages 
were .95, .94, and -93, respectively. 


cadets of the “high” motivation group re- 
sponded with significantly greater frequency 
within the categories of personal interest and 
patience than did the cadets in the “low” 
motivation group. On the other hand, the 
“low” group, under this same set, responded 
with significantly greater frequency than did 
the “high” group within the categories good 
instructional techniques and extra help. Un- 
der the “worst” instructor set, the “high” 
motivation group reacted with significantly 
greater frequency than the “low” group 
within the category verbal assault and with 
significantly less frequency than the “low” 
group within the category poor instructional 
techniques. No significant difference between 
the groups was found within the indifference 
category, under this set. 


Discussion 


_The results indicate that differences of atti- 
tude toward authority-figures do exist between 
cadets of “high” and “low” motivation. _The 
hypothesis, therefore, was substantiated. Spe- 
cifically, it would appear that there is a de- 
gree of variation in identification with in- 
structors between cadets of the two criterion 
groups. Indeed, it may be that this process 
of identification may account for the differ- 
ences obtained. 

While it was initially considered that the 
attitudes studied here were brought by the 
cadets to the training program, one might 
properly question the actual temporal rela- 
tionships involved. That is, were the atti- 
tudes toward these authority-figures brought 


24 


E. P. Hollander and J. T. 


Table 2 


Chi-Square Analysis and Significance Levels between the 
for the Major Response Catego: 


Bair 


Motivation Criterion Groups 
ries 


“Best” Instructor 


Per Cent* 
High Low 
Group Group 
Response Category (N=65) (N=69) x P 
Showed Personal Interest 75 H 12.91 <.001 
Indicated Patience 43 20 8.08 <.01 
Used Good Instructional Techniques 37 63 9.65 <.01 
Gave Extra Help 9 26 6.49 <.02 
“Worst” Instructor 
Per Cent* 
High Low 
Group Group 
Response Category (N=62) (N=70) x? P 
Manifested Verbal Assault 61 21 21.74 <.001 
Used Poor Instructional Techniques 18 40 7.83 <.01 
Indicated Indifference 42 37 1.22 >.30 


* The N’s given here for the groups represent the actual number of people in the criterion groups W ho T 


sponded to the “set.” 


to the training situation, or were they condi- 
tioned mainly by experiences in training? A 
research project has recently been completed 
(5) which essentially duplicated the current 
investigation in order to provide an answer 
to this question. In this study, cadets just 
entering training were given a similar ques- 
tionnaire form in which they were asked to 
give parallel information on previously en- 
countered authority-figures, that is, high 
school or college instructors. The results of 
this study indicate quite conclusively that 
attitudes of the cadets who subsequently 
withdrew from training were similar to those 
of the “low” motivation group of the present 
study. This group tended to describe the 
skill or lack of skill of their high school or 
college instructor in his role as a teacher at 
a significantly higher level than did the ca- 
dets who remained in training. Thus, it ap- 
pears that attitudes toward authority-figures 
are among the attitudes persistently held by 
the cadets, and are related to their level of 
motivation in the Naval Air Training Pro- 
gram, A number of related investigations 


designed to articulate this relationship still 
further are now being conducted. On the 
whole, it would seem that this attitude-elic” 
tation technique bears further scrutiny as a 
possible device for the assessment of motiva 
tion in a number of settings. 


Summary 


This paper reports on attitudes toward at 
thority-figures which discriminated betwee? 
Naval Aviation Cadets of “high” and “low 
motivation. The “high” motivation grouP 
consisted of 65 cadets who had successful 
completed Basic Flight Training, and th? 
“low” group consisted of 72 cadets who were 
withdrawing from training voluntarily. Bot 
groups were required to complete anony” 
mously an open-ended questionnaire for” 
which required them to describe a samp! 
of behavior characteristic of their “best 


Attitudes Toward Authority-Figures 25 


tion tended to manifest attitudes concerning 
interpersonal relationships with their officer- 
instructors while the “low” group stressed 
competence of the instructor in his role as a 
teacher, Interpretations were suggested with 
respect to cadet identification with authority- 
figures as a motivational factor in this setting. 


Received April 23, 1953. 


References 


1. Bair, J. T. Non-test predictors of attrition in 
the Naval Air Training Program. Project 
Number NM 001 058.05.02. U. S. Naval 
School of Aviation Medicine, Pensacola, Flor- 
ida, 28 April 1952. 

2. Cantril, H. The psychology of social move- 
ments. New York: John Wiley, 1941. 

3. Flanagan, J. C. AAF Aviation Psychology Pro- 
gram, Research Report No. 1, Washington, 
D. C., U. S. Government Printing Office, 1948. 

4. Hollander, E. P. and Bair, J. T. The signifi- 
cance of attitudes toward authority-figures in 
discriminating between Naval Aviation Cadets 


on 


10. 


. Newcomb, T. Social psychology. 


. Saenger, G. 


. Sherif, M. An 


. Stouffer, S. A., et al. 


of “high” and “low” motivation. Project 
Number NM 001 058.05.03. U. S. Naval 
School of Aviation Medicine, Pensacola, Flor- 
ida, 27 May 1952. 


- Hollander, E. P. and Bair, J. T. Pre-Training 


attitudes toward authority-figures as predic- 

tors of inadequate motivation among Naval 

Aviation Cadets. Project Number NM 001 

058.05.05. U. S. Naval School of Aviation 

Medicine, Pensacola, Florida, 10 November 

1952. 

New York: 

The Dryden Press, 1950. 

and Proshansky, H. Projective 

techniques in the service of attitude research. 

Personality, 1950, Symposium No. 2, pp. 23- 

24. 

outline of social psychology. 

New York: Harper, 1948. 

The American soldier. 
(Vols. I and II), Princeton: Princeton Uni- 
versity Press, 1949. 

Thomas, W. I. A theory of social personality, 
Chapter IX in Social behavior and person- 
ality. (Edited by E. H. Volkart.) New 
York: Social Science Research Council, 1951. 


THE JOURNAL oF APPLIED PSYCHOLOGY 
Vol. 38, No. 1, 1954 


Readability of Employee’s Letters in Relation to Occupational 
Level 


Arthur C. MacKinney and James J. Jenkins 


University 


In any form of written communication it is 
obviously of great importance to have infor- 
mation concerning the reading ability of the 
audience and to use that information in the 
communication process. Since 1948 (10), a 
large number of articles have appeared in the 
psychological literature and elsewhere stress- 
ing the need to simplify and make more read- 
able the communications which managements 
direct to their employees. (See the bibliog- 
raphy by Hotchkiss and Paterson [8].) 
These articles have suggested the use of read- 
ability formulas (most popularly those pre- 
sented by Flesch) as one means of control- 
ling the level of the communication and thus 
attaining the goal of better understanding. 
Many writers, following the lead of Flesch 
(5, 6) have used educational achievement as 
a base from which the comprehension ability 
of the audience may be estimated. Other 
writers have contributed the results of read- 
ing comprehension tests given to selected sam- 
ples of special audiences. Here, for example, 
one finds the work by Bellows and Palmer 
(1) and Colby and Tiffin (2) on the reading 
levels of foremen and supervisors. For the 
most part, however, some indirect estimation 
procedure must be used since the results of 
applying reading comprehension tests to rank- 
and-file employees in industry are not avail- 
able. 

This paper has a two-fold purpose, first, to 
advance tentatively another estimation pro- 
cedure and, second, to consider in its own 
right the data revealed by this technique. It 
was hypothesized that the readability level of 
employee-written communications should re- 
flect the effective literacy level of the em- 
ployees. It was further hypothesized that 
literacy level increases (as does education and 
intelligence) with higher occupational levels. 
This would mean, then, that the readability 
difficulty of employee-written letters as meas- 
ured by the Flesch formula should increase 


of Minnesota 


as occupational level increases. Briefly, it 
was our belief that in general the complexity 
of one’s writing provides an indirect index to 
the complexity of material which one can 
readily comprehend and that, since it is gen- 
erally agreed that reading ability increases 
with occupational level, complexity of writ- 
ing will increase also. 


Method 


A total of 400 employee-written letters 
were made available from the General Mo- 
tors “My Job Contest” (Evans and Laseau 
[3]).> These letters were randomly drawn 
from a 10 per cent sample of the 174,854 let- 
ters received in this contest. While these let- 
ters are not “typical” writing samples from 
the employees, they are letters written under 
standard stimulus conditions and hence are 
uniquely comparable. 


Average sentence length, syllable counts, 


and Flesch Reading Ease scores were deter- 


mined for each of the letters on the basis of 
a 100-word sample from each letter. In 67 
instances the letters contained less than 100 
words, so the RE scores were determined by 
prorating these on the basis of the total 
words available in that letter. The average 
length of these prorated letters was 7! 
words. All counting was done independently 
of salary level information. 

It is to be noted in connection with this 
analysis that the determination of average 
sentence length for use in the Flesch RE 
formula is done on the basis of separate 
ideas, independent of punctuation (which 
was of dubious accuracy at best in these let- 
ters). This admittedly could introduce @ 
source of error since a change of one sentence 


1 The writers wish to express their appreciation f0" 
the cooperation of the Employee Research Section: 
General Motors Corporation and especially to 
Chester E. Evans. The 400 employee letters a” 
the occupational descriptions used in this article 
were furnished by that organization, 

26 


Readability of Employee’s Letters 27 


in the 100-word sample changes the average 
sentence length and the RE score markedly. 
However, the reliability of the Flesch RE 
Measures has been shown to be quite satis- 
factory (7). 

Following the determination of the RE 
Scores, letters were then classified by occu- 
pational level of their writers. There were 
two major groupings, the salaried and the 
hourly employees. 

The salaried group included the “skilled 
group with responsibilities added,” the 
“skilled group,” and the “partially skilled 
group.” Originally the salaried group in- 
cluded “learners” but this category was elimi- 
nated because of the small number of cases. 
„The salary group was generally defined as 
follows: 


“Sub-managerial and clerical occupations 
involving supervising, coordinating, guiding, 
and performance of general clerical work. 
Primarily concerned with preparation, tran- 
scription, systematizing and filing of oral and 
written communications in offices, shops, and 
other places where such functions are per- 
formed.” 

The group of hourly employees was di- 
vided in accordance with the traditional clas- 
sification into “skilled,” “semiskilled,” and 
“unskilled.” These were defined as follows: 

“Skilled: Includes craft and manual occu- 
pations that require predominantly a thor- 
ough and comprehensive knowledge of proc- 
esses involved in the work, exercise of con- 
siderable independent judgment, usually a 
high degree of manual dexterity, and, in some 
instances, extensive responsibility for prod- 
ucts and equipment. Employees in these oc- 
cupations often become qualified through ap- 
prenticeship or extensive training periods.” 

“Semiskilled: The exercise of manipula- 
tive ability of a high order within a fairly 
well-defined work sequence. The major re- 
liance, not so much upon the employee’s 
judgment or dexterity, but vigilance and 
alertness, in situations in which lapses in 
Performance would damage equipment or 
product. These occupations may require the 
limited performance of part of a craft or 
skilled occupation.” 


“Unskilled: Manual occupation involving 
performance of simple duties which can be 
learned in a short period of time. Little or 
no independent judgment is required and such 
occupations require no similar job experi- 
ence.” 

Some letters were dropped from the sam- 
ple at each stage of the analysis. In all, 26 
cases were discarded leaving a total of 374 
letters for final analysis. As stated before, 
letters by “learners” were discarded. Occu- 
pational classifications were not available or 
were in doubt for several of the letters. 


Results 


Mean and standard deviation of RE scores 
were calculated for each of the occupational 
groups. These are presented in Table 1. 
Analysis of variance applied to the means of 
these groups yielded an F value of 10.61 
which is, of course, significant far beyond 
the .01 level. 

An inspection of Table 1 reveals that a 
clear hierarchy of Mean RE scores is not 
only evident between the major groups but 
within them as well. The means for the 
“skilled” salaried people places them at 
Flesch’s “Fairly Difficult” level, typical of a 
quality magazine, indicating reading achieve- 
ment levels from 10th to 12th grade and re- 
quiring some high school for understanding. 
The “partially skilled” salaried employees 
and the “skilled” hourly employees write at a 
mean level equivalent to the digests, “Stand- 
ard,” indicating reading achievement within 
the 8th and 9th grade levels which requires 
the completion of 7th or 8th grade for un- 


Table 1 
Reading Ease Scores of Employee Letters 


Occupational 
Classification N Mean S.D. 
Salaried Employees: 
Skilled with Responsibilities 18 53.7 10.8 
` Skilled 17 53.6 15.5 
Partially Skilled 21 61.7 14.0 
Hourly Employees: 
Skilled 51 64.0 12.9 
Semiskilled 218 69.1 14.1 
Unskilled 49 72.9 12.1 


28 


Arthur C. MacKinney and James J. Jenkins 


Table 2 


Percentage of Employees in Each Occupational Group Writing at Each Reading Ease Level 


Flesch Reading Ease Levels 


Occupational VD 


D FD S FE E VE 1 
Classification N 0-29 30-49 50-59 60-69 70-79 80-89 90-100 Tota 
Salaried : 100.0 
Skilled with Responsibilities 18 44.4 27.8 22.2 5.6 i, 
Skilled 17 41.2 17.6 17.6 23.5 = 08 
Partially Skilled 21 19.0 23.8 23.8 28.6 4.8 ous 
All Salaried 56 33.9 23.2 21.4 19.6 1.8 
Hourly: 
Skilled 51 2.0 11.8 25.5 25.5 25.5 5.9 3.9 oe, 
Semiskilled 218 0.5 11.9 133 18.3 30.7 20.2 5.0 1 10.0 
Unskilled 49 se 2.0 14.3 18.4 36.7 20.4 8.2 m 
All Hourly 318 0.6 10.4 15.4 19.5 30.8 17.9 5.3 100. 
f A 0 
All Employees 374 0.5 13.9 16.6 19.8 29.1 15.5 4.5 100 


derstanding according to Flesch. The “semi- 
skilled” and “unskilled” hourly workers write 
at a mean level which is like slick fiction, 
“Fairly Easy,” indicating a reading achieve- 
ment of 7th grade and requiring completion 
of 6th grade for understanding. 

More revealing is the tabulation of the per- 
centages of persons of each occupational level 
who wrote at each of Flesch’s readability lev- 
els. These data are summarized in Table 2. 

Here again the progression of reading ease 
Scores over the occupational level hierarchy 
is striking. For example, it may be seen that 
44 per cent of the “skilled with responsibili- 


” 
ties” salaried group write at the “Difficult 
level while only 2 per cent of the unskilled 
(hourly) group write at this same level. 

To further facilitate use of these data, the 
percentages of Table 2 were cumulated for 


each occupational level. The results are pre 
sented in Table 3. 


Discussion 


The data, as presented, are of descriptive 
interest just as they stand. However, a cru- 
cial question remains, Since these letters at 
samples of employees’ writing, does this really 
indicate the reading comprehension level 0 


Table 3 
Cumulative Percentage of Employees in Each Occupational Group Writing at Each Reading Ease Level a 
Flesch Reading Ease Levels pe 
Occupational VD D FD § FE E VE 
Classification N 0-29 30-49 50-59 60269 70-79  g0-89 90-100 
Salaried: 
Skilled with Responsibilities 18 444 722 9044 100 100 100 
Skilled 17 41.2 588 764 10 100 100 
Partially Skilled 21 19.0 428 666 95.2 100 100 
Alll Salaried 56 339 571 785 981 100 100 
Hourly : . 0 
ie 3O 20 138 393 68 o3  gg2 10 í 
Semiskitied we OS DA BT MO M7 o se 
naie = g 22 bs mr ne oe 105 
All Hourly 38 OS dko zsa so gy SS i 
Total s4 os ia ao me wy o M 


Readability of Employee’s Letters 29 


these same employees There is no rigorous 
answer to this question at the present time. 
A consideration of the writing process as op- 
posed to the reading process, production of a 
word as opposed to its recognition, the spe- 
cial conditions of a contest with very sub- 
stantial prizes, the special pressures on the 
individual to make some kind of an entry so 
his group might receive a participation award, 
and the possibility that many of the letters 
were written with help from members of 
one’s family, neighbors, etc., preclude any re- 
alistic discussion of whether an individual 
writes at a level higher than, lower than, or 
similar to his reading comprehension level. 

Some supporting evidences, however, incline 
the writers to the view that this is representa- 
tive writing and that it is indicative of mini- 
mal reading skill. First, repeated analyses of 
house organs (presumably written by salaried 
employees who are “skilled with responsibili- 
ties”) show their mean level to be very close 
to that indicated in this study. In general, 
their writing averages RE scores of about 50 
(4, 11) as compared to the average of 54 ob- 
tained in this study. Second, the study by 
Bellows and Palmer (1) of reading compre- 
hension of foremen (who are presumably like 
the “skilled” hourly worker) seem to match 
very closely the data obtained in this study 
for this group. Their data are presented in 
modified form in Table 4 for comparison with 
the group from this study. Colby and Tiffin 
(2) find the median reading grade for factory 
supervisors to be the 10th grade level while 
this study shows a median in the 9th grade 
level. 

If one accepts the data from these letters 
as reflecting the minimal effective literacy 
level of the employees, then this industrial 
audience has been somewhat more clearly 
delineated. To reach 95 per cent of all em- 
ployees for example, Table 3 indicates it 
would be necessary to write at the “Easy” 
level of 80 to 90 (pulp fiction). This is the 
level which Flesch has predicted would reach 
91 per cent of the adult population. 

On the other hand, if one were concerned 
only with reaching the top salary-level group 
tepresented here (skilled with responsibilities 
added) 94 per cent of that group would find 


Table 4 


Reading Comprehension Grade of Foremen as Measured 
by Bellows and Palmer in Comparison with 
Estimated Comprehension Level of 
Skilled Sample in this Study 


Per Cent 
. of Skilled 
Reading Per Cent of (Estimated 
Compre- Foremen from RE 
hension (Bellows and Palmer) Scores) 
Grade Level N = 100 N=51 
16+ 4 2.0 
13-16 21 11.8 
10-12 26 25.5 
8-9 27 25:5 
7 6 25.5 
6 8 5.9 
4-5 8 3.9 


the “Standard” level within their reading 
comprehension. This would be a Reading 
Ease level of 60 to 70 and is typical of digest 
magazines. 

Writing at the “Fairly Easy” level (typical 
of slick fiction; Reading Ease 70 to 80) 
would be easily understood by 80 per cent of 
all employees. It would be well within the 
grasp of almost all salaried employees. How- 
ever, only 71 per cent of the unskilled em- 
ployees would readily comprehend this “Fairly 
Easy” level of writing. 

It is interesting that this standard of RE 
of 70 or easier was recommended by Paterson 
and Walker (11), Farr, Paterson, and Stone 
(4), and by Lauer and Paterson (9) in their 
studies of industrial communications intended 
for “rank-and-file employees.” 


Summary 


A total of 400 employee letters were ran- 
domly drawn from the 10 per cent sample of 
letters received in the General Motors “My 
Job Contest.” One 100-word sample from 
each letter was analyzed by the Flesch Read- 
ing Ease formula. The letters were then 
sorted by occupational level of the writer. 
The mean RE score and the standard devia- 
tion were computed for each of six occupa- 
tional levels. Mean differences between the 
groups were highly significant. 

A hierarchy of mean RE scores was found 
to exist ranging from a mean of 54 (Fairly 


30 Arthur C. MacKinney and James J. Jenkins 


Difficult) for the “skilled” salary groups to 
a mean of 73 (Fairly Easy) for the “un- 
skilled” hourly employees. A table showing 
the percentage of each group writing at each 
RE level was prepared to more fully describe 
the distributions. Some evidence suggesting 
that the writing was representative and in- 
dicative of comprehension level was presented. 
The results were interpreted as confirming 
previous readability studies of industrial com- 
munications and as providing a guide for the 
preparation of industrial communications. 
Received April 2, 1953. 


References 


1. Bellows, R. M. and Palmer, D. H. Unpublished 
study. Cited in Bellows, R. M., Psychology 
of personnel in business and industry. New 
York: Prentice-Hall, 1949, p. 499. 

2. Colby, A. N. and Tiffin, J. Reading ability of 
industrial supervisors. Personnel, 1950, 27, 
156-159, 


3. Evans, C. E. and Laseau, L. N. My job con- 
test. Personnel Psychol. Monogr., 1950, No. 1. 

4. Farr, J. N., Paterson, D. G., and Stone, C. H. 
Readability and human interest of manage- 
ment and union publications. Jndustr. Labor 
Relat. Rev., 1950, 4, 88-93. 

5. Flesch, R. F. The art of plain talk. New 
York: Harpers, 1946, p. 210. 

6. Flesch, R, F. The art of readable writing. 
New York: Harper, 1949, p. 499, 

- Hayes, Patricia M., Jenkins, J. J., and Walker, 
B. J. Reliability of the Flesch readability 
formulas. J. appl. Psychol., 1950, 34, 22-26. 

8. Hotchkiss, S. N. and Paterson, D. G. Flesch 

readability reading list. Personnel Psychol. 

1950, 3, 327-344, . 
9. Lauer, Jeanne and Paterson D, G. Readability 

of union contracts. Personnel, 1951, 28, 3-7- 


10. Paterson, D. G. and Jenkins, J. J. Communica- 


tion between management and workers. 
appl. Psychol., 1948, 32, 71-80. 


11. Paterson, D. G, and Walker, B, J. Readability 


and human interest of house organs. Per- 
sonnel, 1949, 25, 438-441. 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 1, 1954 


Scaling Procedures Based on the Method of Paired Comparisons 


Sam L. Witryol F 
Department of Psychology, The University of Connecticut 


The primary purpose of this paper is to 
present an experimental comparison of three 
scaling approaches to the method of paired 
comparisons: Thurstone’s Case III and Case 
V, and Guilford’s Short Cut. Recent litera- 
ture pertaining to a variety of related prac- 
tical and theoretical developments will also 
be briefly reviewed. 

There appears to be a resurgence of inter- 
est on the part of investigators in many areas 
of psychology in the applicability of the 
method of paired comparisons to practical 
scaling problems. Fortunately some impor- 
tant original contributions, clarifying meas- 
urement problems and re-examining basic as- 
sumptions, have also recently been published. 
The work of Mosteller (21, 22, 23, 24) is 
exemplary and constitutes, in the opinion of 
the writer, the most brilliant rational discus- 
sion of paired-comparison scaling features 
since Thurstone’s early developments (26, 27, 
28). 

In a previous investigation (30), the writer 
employed Thurstone’s Case V for scaling 
paired-comparison data on teacher-generated 
motivational values in the classroom. The 
Case V scale values from four different class- 
room groups were compared with the values 
from the same samples scaled by Case III. 
The correlations between these scale values 
obtained by these two methods from four 
sets of data were essentially unity, despite 
the fact that one of the assumptions for Case 
V (equal discriminal dispersions) appeared 
to have been violated. In the present in- 
vestigation the values obtained from these 
data by means of Thurstone’s Case III and 
Case V approaches were compared with those 
obtained by Guilford’s Short-Cut method 


' (10). The Guilford method was rejected by 


the writer in the earlier study because it 
seemed to lack a defensible rationale for the 
discriminal unit. This decision will be re- 
examined here in the light of experimental 
findings from the present study and from re- 
lated researches. 


31 


Gulliksen (11) has discussed the broad 
scaling characteristics and power of the 
method of paired comparisons, and Burros 
(4) has pointed out that this “psychophysi- 
cal” procedure has special value for scaling 
stimuli when the “physical” correlates are 
not easily discernible. Coombs (5) has 
noted that the data in most scaling experi- 
ments are qualitative in nature. With these 
considerations in mind, it is worthwhile to 
evaluate the method of paired comparisons 
in terms of generality of measurement to 
various types of qualitative data and also in 
terms of economy of application. 


Previous Research Findings 


In the present writer’s earlier investigation 
(30), the scaling rationale developed by 
Thurstone for Case III and for Case V, and 
the rationale developed by Guilford for his 
Short-Cut method were described in some 
detail. The major relevant features will be 
briefly reviewed here. Thurstone (26) ex- 
amined various assumptions under five spe- 
cial cases for the application of the method 
of paired comparisons to scaling the psycho- 
physical law of comparative judgment. The 
choice of these Thurstone scaling methods 
generally hinges upon the selection of either 
Case III or Case V, based upon whether the 
stimulus dispersions are approximately equal 
or unequal. Thurstone admitted that these 
dispersions could never be directly observed 
(27), and later developed a statistical pro- 
cedure for approximating the measurement 
of the “ambiguity of each stimulus” (28). 
The important practical consideration was 
the fact that Case V, a much simpler scaling 
device, was indicated by Thurstone to be ap- 
plicable ¿f the stimulus dispersions were ap- 
proximately equal. 

As a laborsaving alternative to Case V, a 
Short-Cut method was devised by Guilford 
(8). The results with this procedure yielded 
very high correlations with the results ob- 
tained from Thurstone’s Case _V.—b uil- 

eee Research \ 
pa EDM ~ aa euE | 


uit 
` \ 


t wey TET 
\ 
\ F rel = 


— 


32 Sam L. Witryol 


ford was unable immediately to present an 
adequate mathematical and psychological 
justification. Later, he attempted this latter 
task, but he could not develop a unit for the 
psychological scale (9). 

Recent empirical findings have been sug- 
gestive. Edwards (6) reported a very high 
correlation, approaching unity, between values 
obtained with Case III and Case V, but did 
not indicate the values for the stimulus dis- 
persions. Koch! (17) also reported very 
high correlations (about + .99) between 
scale values obtained with Case V and Guil- 
ford’s Short Cut. Satter (25) found Guil- 
ford’s approach to be a highly reliable method 
for job evaluation. These findings are in gen- 
eral consistent with the research experience 
(30, 31, 32) of the writer as well as with 
the experimental findings to be reported in 
the present investigation. 

However, an adequate rationale is not 
readily ascertainable from a review of the 
empirical findings, and one must turn to the 
recent brilliant efforts of Mosteller (21, 22, 
23, 24) for the most productive and provoca- 
tive leads. Mosteller presents a careful 
mathematical rationale which makes it pos- 
sible to relax restrictions previously consid- 
ered basic assumptions for the application of 
Case V. Thus he has demonstrated that: 


1. An assumption of equal correlations, as 
well as one of zero correlations, between the 
stimulus pairs is tenable (21). 

2. An aberrant stimulus standard devia- 
tion affects only the position of that stimulus 
involved (22), 

3. If the aberrant stimulus dispersion is 
near the center of the scale, scale positions 
of the other stimuli will not be seriously af- 
fected (22). 

4. The requirement of normality in the 
original distribution is not necessary (24), 

Furthermore, Mosteller (23) proposed a 
test of goodness of fit of observed to theo- 
retical proportions; this method is also de- 
signed to test unidimensionality. 

A recent rational development by Burros 
(4) is noteworthy. He worked out a method 


‘Obtained in part by personal communication 


from Dr. Helen L. Koch, University of Chicago, Oct. 
4, 1950. 


for estimating stimulus dispersions. His re- 
sults compare favorably with Thurstone’s as 
valid estimates, and they have the advantage 
of requiring less arithmetical computation, 
although Burros’ formulae are more compli- 
cated. 

The problem of unidimensionality of the 
paired-comparison scale has received serious 
consideration. Most investigators have ap- 
plied Thurstone’s methods to data assumed 
to be ordered along a single dimension. How- 
ever, Gulliksen’s excellent analysis (11) dem- 
onstrated the feasibility of the application of 
the method of paired comparisons to multi- 
dimensional scales. In fact, he reasoned that 
this power of the paired-comparison method 
was a significant advantage over ordinary 
ordinal scales, and he reviewed researches 
which were exemplary of these possibilities. 
In any event, determination of unidimension- 
ality or multidimensionality is an important 
factor in a specific experimental situation. 

Mosteller (23), as noted above, has de- 
veloped a chi square test for unidimension- 
ality with the restrictive assumption relaxed 
to equal in addition to zero correlations be- 
tween the stimulus pairs. Kendall and Smith 
(14, 15) derived a “coefficient of agreement 
(also “coefficient of consistence”) to test the 
assumption of linearity of the paired-com- 
parison variate under consideration. Johnson 
(13) described this test, and, recently, Balin- 
sky, Blum, and Dutka (3) demonstrated its 
applicability in determining the consistency 
of product preferences, Finally, an experi- 
mentally provocative and potentially fruitful 
approach to multidimensional variates was 
suggested by Andrews (1). He performed 4 
factor analysis of the multidimensional ele- 
ments in stimuli Presented in paired-compat! 
son form; his analysis was derived from the 
table of proportions conventionally calcu- 
lated as part of the computational process. 


Experimental Procedure 


The paired-comparison data analyzed in 
this experiment were obtained in an earlier 
investigation (30) where the methodologic 
details were fully described; the main fea 
tures will be briefly reviewed. The stimu! 
consisted of a group of ten praiseworthy a? 


Scaling Procedures Based on Method of Paired Comparisons 33 


of another group of ten blameworthy cate- 
gories derived from teacher-generated moti- 
vational values as reported by school chil- 
dren. Each of these two groups of ten 
stimuli were presented in paired-comparison 
form to 1,120 school children in grades 6-12. 
The subject’s task was to judge which of each 
Pair of stimuli was more teacher-approved or 
disapproved. Case V scale values were calcu- 
lated from the responses to these stimuli for 
each sex by age-grade classification, so that 
each sample population included 80 subjects. 
Thus, a total of 28 sets of scale values, with 
ten stimuli in each, were computed. 

For purposes of the present experiment four 
sample sets from the above data were selected 
for comparative analyses by means of three 
different scaling procedures: Thurstone’s Case 
V and Case III, and Guilford’s Short Cut. 
The sets were selected from the total popula- 
tion in such a manner as to represent both 
sexes, both experimental conditions (praise 
and blame) and, finally, different age-grade 
levels. Each sample represented a particular 
sex, experimental condition, and age-grade 
level. The specific nature of the sample sets 
can be readily observed from the captions of 
the tables and figures in the results, below. 


Results 


The scale values obtained by each of the 
three scaling approaches to paired compari- 
sons are presented in Tables 1, 2, 3, and 4. 
The discriminal dispersions of each of the 
stimuli, as estimated by Thurstone’s Case TIT, 
are shown in the last column of each table. 
Twelve product-moment correlations obtained 
by comparing the scaling results calculated 
by the three different approaches range from 
-987 to .999; these intercorrelations appear 
in the bottom three rows of the four tables. 
The averages of the four intercorrelations ob- 
tained by comparing Case V with the Short 
Cut, Case ITI with Case V, and Case III with 
the Short Cut in all the samples are .998, 
994, and .991, respectively. 

It should be noted from the tables that the 
discriminal dispersions in the last columns 
are not approximately equal. It can be seen 
by inspection that there is a considerable 
range in these estimated dispersions in each 


Table 1 


“Teacher Praise” Scale Values Computed by 
Thurstone’s Case III and Case V and 
Guilford’s Short-Cut Methods 
(80 Sixth-Grade Boys) 


Discriminal 
Behavior- Dispersions 
Activities Case Case Short (Estimated 
(Stimulus) Ill Vv Cut by Case ITI) 
Honest 1.21 1.08 82 1.414 
Obey T3 -66 49 850 
Attention -70 .63 AS 911 
Polite - 68 60 47 -660 
Cooperative 38 35 28 1.399 
Talking 19 21 18 848, 
Industrious ld AS 16 827 
Help 3 14 13 .955 
Independent .04 04 05 -939 
Clean 0 0 0 1.195 
o 375 .329 242 
ru-v .999 
TIII-se 997 
Ty-20 -999 


of the four samples. Finally, the standard 
deviations of the scale values obtained by 
each of the three approaches are systemati- 
cally smaller in Case V and in the Short Cut, 
respectively, than in Case III. 


Table 2 


“Teacher Praise” Scale Values Computed by 
Thurstone’s Case III and Case V and 
Guilford’s Short-Cut Methods 
(80 Twelfth-Grade Girls) 


Discriminal 


Behavior- Dispersions 
Activities Case Case Short (Estimated 
(Stimulus) IIL Vv Cut by Case III) 
Honest 3.06 2.96 1.82 1.318 
Polite 2.58 251 144 -786 
Industrious 2.17 2.10 1.20 1.127 
Attention 214 206 1.15 595 
Independent 1.93 1:92 tz 1.019 
Cooperative 1.90 1.90 1.11 1.351 
Obey 1.81 1.82 1.03 728 
Talking 1.26 1.11 .60 823 
Clean 1.20 1.06 54 1.274 
Help 0 0 0 980 
o 798 791 A482 
rint-v 998 
rise 992 
TV-s 997 


34 Sam L. Witryol 


Table 3 


“Teacher Scold” Scale Values Computed by 
Thurstone’s Case IIT and Case V and 
Guilford’s Short-Cut Methods 
(80 Eighth-Grade Girls) 


Table 4 


“Teacher Scold” Scale Values Computed by 
Thurstone’s Case IIT and Case V and 
Guilford’s Short-Cut Methods 
(80 Tenth-Grade Boys) 


Discriminal Discriminal 

ior- ispersions Behavior- Dispersions 
PaE Case Case Short an Activities Case Case Short (Estima 
(Stimulus) Tr Vv Cut by Case III) (Stimulus) mM vV Cut by Case 
Rude 1.88 1.67 1.06 -685 Dishonest 1.95 1.76 1.18 1.352 
Dishonest 1.79 1.58 1.02 1.346 Rude 1.51 1.42 .96 1.159 
Disobey 1.73 1.48 .93 .925 Disobey 1.49 1.40 92 -157 
Disturb 1.56 1.26 142 551 Fight 1.29 1.16 74 939 
Chew Gum 1.49 1.19 770 1.281 Chew Gum 1.25 1.12 .70 1.124 
Fight 1.34 1.07 69 1.184 Disturb 1.23 1.10 .67 855 
Poor Work 1.24 96 63 .589 Poor Work 1.08 90 59 605 
Attention 114 83 54 .910 Talking 87 62.40 858 
Talking 109 80 50 1.781 Attention 81 54 38 OAS 
Untidy 0 0 0 747 Untidy 0 0 0 1.706 
T 512 461 .292 (d 494 483 320 
rin-y .989 ni-y .988 
TII -s0 988 Titt=se 987 
TV-s0 -999 TV—80 997 


These quantitative results are graphically 
represented in Figures 1, 2, 3, and 4. 


Discussion 
The empirical comparisons in this experi- 
ment suggest the following conclusions: 


1. The Case V approach appears to yield 


essentially the same scale distribution as the 


CASE 
Wi 


140 or A 


HONEST 


CASE SHORT 
y aT 


omcr>< mrson 


Z= 
A ŞS 
40 TALKING 
aa L Da 
EAN 
SS 
0 | CLEAN = +—__.} __, 


Fic. 1. “Teacher praise” scale values computed 
by Thurstone’s Case IIT and Case V and Guilford’s 
Short-Cut methods (80 sixth-grade boys). 


Case III method for the stimuli employed in 
this experiment. This is true despite the fac 


that one assumption for Case V—approximate 4 


equality of the estimated discriminal disper 
sions—is grossly violated in each of the foul 
samples, } 

2. Guilford’s Short-Cut approach appeals 
to yield essentially the same scale distributio” 
as both the Case V and Case III methods for 
ordering the stimuli employed in this study ' 
This is true despite the frequent observation 
in the literature that Guilford was unable t° 
indicate a unit for his psychological scale. 

In the opinion of the writer, these conclu: 
sions, taken in conjunction with the empiric4 
findings of other investigators, and considere 
from the standpoint of contemporary ration@ 
developments, Suggest a number of practica 
and theoretical implications. One possibility 
regarding the violation of the assumption 0 
equal discriminal dispersions 
indicated from Mosteller’s work (22): He 
has reasoned that if an aberrant stimulu“ 
(i.e., dissimilar in discriminal dispersion) F 
near the center of the scale, there will not i 
much effect upon the ordering of the stimt 
along the scale by means of Case V. TH 


y —e — 
OO E 
aa re 


for Case V ® 


Scaling Procedures Based on Method of Paired Comparisons 35 


CASE SHORT 
ut CcUuT 


sae | 
a 


CASE 


| POLITE 
260p- Pi T 


zaol- 


2.20 | INDUSTRIOUS ; 
INDUSTRIOUS 


ATTENTION > 
ls INDEPENDENT 
EE, 
ji COOPERATIVE - 
OBEY A~ 


amcre< mrpon 
v 
o 
© 


TALKING 
120; CLEAN ~ 


201 


o | HELP. 


p= 


Fic, 2. “Teacher praise” scale values computed 
by Thurstone’s Case III and Case V and Guilford’s 
Short-Cut methods (80 twelfth-grade girls). 


explanation provides a relaxation of the re- 
striction that the largest discriminal disper- 
sion should be no larger than twice the small- 
est dispersion for the employment of Case V 
(10). The most seriously aberrant stimuli 
in the data in the present investigation are for 
the most part the smaller ones, and they tend 
to fall near the middle of the scales in the 
four samples. It should also be kept in mind 
that Thurstone himself admitted that his cal- 
culated dispersions were estimates (28) and 
Contained large probable errors. s 
An important practical consideration 
emerges from these possibilities. If Mostel- 
ler’s rational efforts combined with the em- 
Pirical findings in this study point toward an 
increasing generality of the applicability of 


Case V, then the labor of calculations will be 
greatly reduced, as compared to Case III, and 
these conditions might then stimulate more 
widespread use of a very valuable, powerful, 
and somewhat neglected tool in psychological 
measurement, namely the method of paired 
comparisons. As a matter of fact, the classi- 
cal reference to this tool as “psychophysical”? 
is somewhat misleading since Thurstone has 
emphasized that (29, p. 142), “Although the 
law of comparative judgment is easily applied 
to the stimuli of classical psychophysics, the 
more generally interesting applications are 
those which involve social, moral, and esthetic 
values, opinion polls, and consumer prefer- 
ences.” More recently, the method of paired 
comparisons has been exploited in such di- 
verse areas as sociometry (17, 31, 32), in- 
dustry (18, 19, 25), social motivation (30), 
and learning theory (12, 33). 

Guilford’s Short-Cut method provides an 
even more economical approach than Thur- 
stone’s Case V. The shortcoming of Guil- 
ford’s approach is the lack of an adequately 
defined psychological unit. Yet, it appears 
to “work,” as demonstrated in the empirical 
findings reported in the present study. Per- 
haps a possible rationale for this approach 


CASE CASE SHORT 
ih v CUT 


a4 hse’ ae 
RUDE > 
1.80 DISHONEST __.. 
DISOBEY _ ~ 
160f- OSTURS __, 
CHEW GUM > 
1.40F FIGHT 
POOR WORK 
1.20 ATTENTION 
TALKING + 
100 


umcep< mrpon 


60 
40 


.20 


o | uNTioy 


Fic. 3. “Teacher scold” scale values computed by 
Thurstone’s Case III and Case V and Guilford’s 
Short-Cut methods (80 eighth-grade girls). 


36 Sam L. Witryol 


CASE 
iil 


a 


SHORT 
CUT 


200 DISHONEST 
180 


160 F RUDE 
DISOBEY > 

1407 FIGHT 

120P ostur _ 7 


POOR WORK + 
100 


TALKING 
— 
Bo |- ATTENTION —— 


omerp<s mrpan 


‘ ~ 
| 


40 


20 


o | unToy 


=j 


Fic. 4. “Teacher scold” scale values computed by 
Thurstone’s Case III and Case V and Guilford’s 
Short-Cut methods (80 tenth-grade boys). 


might be found in Coombs’ (5) “ordered 
metric” scaling concept, although he feels 
that the greater power of the method of 
paired comparisons is wasted, since the 
method of rank orders can be easily em- 
ployed for his “psychological scaling without 
a unit of measurement.” It is of interest to 
note here that Edwards (6, 7) has demon- 
strated the method of successive intervals to 
be an economical alternative to the method 
of paired comparisons. 

If statistical theorists can continue to re- 
solve some more of the rational problems of 
the method of paired comparisons, there is 
promise that this highly reliable technique 
will continue to be expanded to an increas- 
ingly larger number of qualitative problems 
in psychological measurement, This promise 
has been demonstrated by practical attempts 
to curtail the amount of labor involved in the 
subject’s task, in the ordering of the Pairs, 
and in scoring results, McCormick, Bachus, 
and Roberts (19, 20) studied the effects of 
decreasing the number of pairs upon the re- 
liability of the resulting scales. Angoff (2) 
has investigated the problem of removing 
obsolete items from an established paired- 


comparison scale and adding new items. 
Kephart and Oliver (16) introduced — a 
punched card procedure as a laborsaving 
device for ordering pairs and scoring results. 
This combination of empirical research and 
rational development has fortified the useful- 
ness of a powerful and extremely practical 
scaling technique, the method of paired com- 
parisons. Psychologists interested in research 


with qualitative data will find a valuable aid 
here. 


Summary 


The purpose of this study was to make an 
experimental comparison of Thurstone’s Case 
III and Case V, and Guilford’s Short-Cut ap- 
proaches to scaling paired-comparison ee 
and to review recent rational and empirica 
developments of theoretical and practical sig- 
nificance for the application of paired 
parisons to qualitative data. The stimuli 
were ten teacher-approved and ten teacher- 
disapproved behavior categories presented 2 
paired-comparison form to four groups O 
school children. Each of the four groups 
contained a sample of 80 subjects and repre- 
sented a particular sex, experimental condi- 
tion (teacher-approved or disapproved behav- 
ior categories), and an age-grade level in the 
range from grades 6-12. 

The intercorrelations between the scale 
values obtained by the three methods in the 
four samples for both sexes under both ex- 
perimental conditions were approximately 
unity; twelve product-moment intercorrela 
tions were .987 or higher. The results wer? 
interpreted as corroborative of recent ration’ 
and empirical investigations demonstrating 
the power of less complicated and economica 
approaches to scaling paired-comparison datē 
than Thurstone’s Case III, with the relaxa 
tion of certain restrictive assumptions. P05" 
sibilities for broader application of the 


method of paired Comparisons to qualitativ® 


psychological problems were reviewed. 


Received April 6, 1953. 


References 


T. G. Multidimensional psychopb¥5, 
ics: a new research method, Paper rea 


Eastern Psychol. Ass., Atlantic City, March: 
1952, 


a 


_ ee E. S 


"r 


11. 


12. 


18. 


19, 


- Koch, Helen L. 


Scaling Procedures Based on Method of Paired Comparisons 37 


. Angoff, W. H. An empirical approach to a 


problem of psychophysical scaling. J. appl. 
Psychol., 1949, 33, 59-68. 


. Balinsky, B., Blum, M. L., and Dutka, S. The 


coefficient of agreement in determining prod- 
uct preferences. J. appl. Psychol., 1951, 35, 
348-351. 


. Burros, R. H. The application of the method of 


paired comparisons to the study of reaction 
potential. Psychol. Rev., 1951, 58, 60-66. 


. Coombs, C. H. Psychological scaling without a 


unit of measurement. Psychol. Rev., 1950, 


57, 145-158. 


. Edwards, A, L. Psychological scaling by means 


of successive intervals. Univ. Chicago, Psy- 
chometric Laboratory Report No. 69, May, 
1951. 


. Edwards, A. L. The scaling of stimuli by the 


method of successive intervals. J. appl. Psy- 


chol., 1952, 36, 118-122. 


. Guilford, J. P. The method of paired compari- 


sons as a psychometric method. Psychol. 
Rev., 1928, 35, 494-506. 


. Guilford, J. P. Some empirical tests of the 


method of paired comparisons. J. gen. Psy- 


chol., 1931, 5, 64-77. 


. Guilford, J. P. Psychometric methods. New 


York: McGraw-Hill, 1936. 

Gulliksen, H. P ired comparisons and the logic 
of measureme.t. Psychol. Rev., 1946, 53, 
199-213. 

Hull, C. L., Felsinger, J. M., Gladstone, A. I., 
and Yamaguchi, H. G. A proposed quantifi- 
cation of habit strength. Psychol. Rev., 1947, 
54, 237-254. 


. Johnson, P. O. Statistical methods in research. 


New York: Prentice-Hall, 1949. 


. Kendall, M. G. The advanced theory of sta- 


tistics. London: Charles Griffin & Co., 1945. 


. Kendall, M. G. and Smith, B. B. On the method 


of paired comparisons. Biometrika, 1940, 31, 
324-345. 


. Kephart, N. C. and Oliver, J. E. A punched 


card procedure for use with the method of 

paired comparisons. J. appl. Psychol., 1952, 

36, 47-48. i 

A study of some factors condi- 
tioning the social distance between the sexes. 
J. soc. Psychol., 1944, 20, 79-107; : 

Lawshe, C. H., Kephart, N. C., and McCormick, 
E. J. The paired comparison technique for 
rating performance of industrial employees. 
J. appl. Psychol., 1949, 33, 69-77. ; 

McCormick, E. J. and Bachus, J. A. Paired 
comparison ratings. I. The effect on ratings 
of reductions in the number of pairs. J. appl. 
Psychol., 1952, 36, 123-127. 


20. 


u 
R 


. Mosteller, F. 


McCormick, E. J. and Roberts, W. K. Paired 
comparison ratings. II. The reliability of rat- 
ings based on partial pairings. J. appl. Psy- 
chol., 1952, 36, 188-192. 


. Mosteller, F. Remarks on the method of paired 


comparisons: I. The least squares solution as- 
suming equal standard deviations and equal 
correlations. Psychometrika, 1951, 16, 3-9. 


. Mosteller, F. Remarks on the method of paired 


comparisons: II. The efect of an aberrant 
standard deviation when equal standard devia- 
tions and equal correlations are assumed. 
Psychometrika, 1951, 16, 203-206. 


- Mosteller, F. Remarks on the method of paired 


comparisons: III. A test of significance for 
paired comparisons when equal standard de- 
viations and equal correlations are assumed. 
Psychometrika, 1951, 16, 207-218. 

Some miscellaneous contributions 
to scale theory: Remarks on the method of 
paired comparisons. Harvard University Labo- 
ratory of Social Relations. Report No. 10, 
1952. 


. Satter, G. A. Method of paired comparisons 


and a specification scoring key in the evalua- 
tion of jobs. J. appl. Psychol, 1949, 33, 
212-221. 


. Thurstone, L. L. A law of comparative judg- 


ment. Psychol. Rev., 1927, 34, 273-286. 


. Thurstone, L. L. The measurement of opinion. 


J. abnorm. soc. Psychol., 1928, 22, 415-430. 


28. Thurstone, L. L. Stimulus dispersions in the 


method of constant stimuli. J. exp. Psychol., 


1932, 15, 284-297. 


. Thurstone, L, L. Psychophysical methods. In 


Andrews, T. G. (Ed.), Methods of psychol- 
ogy. New York: Wiley, 1948, Chapt. 5, 124- 
157. 


. Witryol, S. L. Age trends in children’s evalua- 


tion of teacher-approved and teacher-disap- 
proved behavior. Genet. Psychol. Monogr, 
1950, 41, 271-326. 


. Witryol, S. L. and Thompson, G. G. A critical 


review of the stability of social acceptability 
scores obtained with the partial-rank-order 
and the paired-comparison scales. Genet. 
Psychol. Monogr., in press. 

Witryol, S. L. and Thompson, G. G. An experi- 
mental comparison of the stability of social 
acceptability scores obtained with the partial- 
rank-order and the paired-comparison scales. 
J. educ. Psychol., 1953, 44, 20-30. 


. Zeaman, D. An application of sEr quantifica- 


tion procedure. Psychol. Rev., 1949, 56, 341- 


350. 


Tue JOURNAL or APPLIED PsycHoLocy 
Vol. 38, No. 1, 1954 


Reliability and the Number of Rating Scale Categories 


A. W. Bendig + 
University of Pittsburgh 


A recent study (1) has presented evidence 
that the reliability of rating scales is inde- 
pendent of the number of categories on the 
scale. In this study Ss rated themselves on 
their comparative knowledge about 12 foreign 
countries, The scales used varied in the num- 
ber of verbal anchors used to define the scale 
categories (1, 2, or 3) and in the number of 
categories (3, 5, 7, 9, or 11). Both group 
and individual rater reliability was relatively 
invariant over the range from three to nine 
categories with a slight drop in reliability at 
eleven scale points. These results contradict 
the theoretical analysis of Symonds (7) who 
concluded that scale reliability should in- 
crease with greater numbers of scale cate- 
gories, but that this increase becomes negli- 
gible above nine scale points. — 

However, the empirical results reported 
were germane only to one type of reliability: 
the reliability with which Ss can distinguish 
between stimuli presented to them. This is 
the type of reliability analysis that is impor- 
tant when raters are given the task of rating 
stimuli on some criterion scale and the mean 
tating for each stimulus is to be used as the 
criterion measure. The assessment of the re- 
liability of pooled supervisor ratings of work- 
ers is a practical example of this type of prob- 
lem. A second question subject to reliability 
analysis is how well a series of self-ratings 
discriminates among the Ss. Many psycho- 
metric instruments can þe regarded as a se- 
ries of stimuli presented to the Ss with the 
request that the S$ rate himself on a two, 
three, or R category scale. For example, the 
Strong Vocational Interest Blank commonly 
requires self-rating on a three-category scale 
while the revised Bogardus Scale of. Social 
Distance uses a seven-point scale. Com- 

monly the total score of an S on these in- 
struments is the sum or mean of his ratings 
and the reliability question concerns the 


1 Miss Janine Sprague assisted with some of the 
statistical computations, 


38 


ability of the test’s total score to discriminate 
among the Ss. This type of reliability is the 
more usual “test reliability” compared to the 
first type which we might call “rater reli- 
ability.” Our first study (1) suggested that 
Symonds’ analysis does not hold for rater re- 
liability, but did not present evidence con- 
cerning test reliability. 

The present report concerns a study of the 
reliability of food preference ratings. Some 
years ago Wallen (9) found that responses tO 
a check list of food aversions significantly 
discriminated between groups of normal an 
neurotic military personnel. Because of the 
restricted range of food aversions in his no! 
mal groups Wallen reports no reliabilities f0" 
normal Ss. However, the data given (9, PP: 
79-80) permit the application of Kuder- 
Richardson formula 20 (8, p. 92). Using 
this formula the reliabilities for two normal 
groups are estimated to be .28 (N = 100) 
and 82 (N= 114). The weighted mean 
(r-to-z transformation) of these two estimates 
is .64. The original Wallen check list used 4 
rating scale with only two categories: strong 
dislike or acceptance of the 20 food stimuli 
presented to the S. Symonds’ conclusions 
would suggest that the somewhat low reli- 
ability of this instrument should increase ! 
the Ss were allowed to rate the foods on rat- 


ing scales containing more categories that 


would permit the Ss to make finer discrimina- 
tions among the foods. 


Procedure 


Scales. The stimuli to be rated by the 55 
were the list of 20 foods used by Wallen (9); 
This list was given to each § with a rating 
scale having either 2, 3, 5, 7, or 9 categories: 
For the two-category scale the instructions 
used by Wallen (9, p, 78) were given, Thes® 
instructions were modified for the other scales 
and three anchoring Statements were used t° 
describe the center and end categories op 
these scales. The statements used were: (4 


Cih 


Reliability and Number of Rating Scale Categories 39 


T like this food very much and eat it often; 
(b) I am somewhat neutral toward this food, 
neither liking nor disliking it much; and (c) 
I dislike this food so much that I refuse to 
eat it. 

The anchored scales with unit digits (1, 2. 
etc.) designating the categories were mimeo- 
graphed on single sheets with the list of 20 
foods and randomly distributed to the Ss. 

Subjects. The Ss were 249 students in in- 
troductory and social psychology classes. The 
ratings of Ss were excluded from the analysis 
whenever the S used less than one-half of the 
available categories in rating the foods. Thus 
an S, using the five-category scale, who used 
only ratings of one and five was not included. 
A total of 13 Ss was eliminated under this 
criterion, giving a study group of 236 Ss. 

Analysis. Test reliability was assessed us- 
ing the analysis of variance technique devised 
by Hoyt (4, 5) and recommended by Thorn- 
dike (8, pp. 93-96). Rater reliability was 
estimated by the similar procedures described 
by Ebel (2). Since the number of raters 
varied slightly from scale to scale, the re- 
liability of a single rater was computed to 
adjust for this varying N. Confidence limits 
(90 per cent) were computed following Ebel 
(2, pp. 413-414). Finally, the average rank- 
difference correlations between the rankings 
of the foods on the five rating scales were 
computed (6, pp. 80-84). 


Results 


The test reliability and rater reliability esti- 
mates for each rating scale can be found in 
Table 1 along with the 90 per cent confidence 
interval for each reliability. The five test re- 
liability estimates were tested for homogeneity 
using the chi-square method described by Ed- 
wards (3, p. 135). The resulting chi-square 
value: was 1.12 which, with four degrees of 
freedom, is not significant at the .05 confi- 
dence point. The mean reliability was .625. 
A similar test of the homogeneity of the rater 
reliabilities gave a chi-square of 1.76 which 
again is not significant. The mean rater re- 
liability was 0.23. 

The average rank-difference correlation be- 
tween the rankings of the 20 foods on the five 
scales was .90 when corrected for ties. Since 


Table 1 


Reliability Estimates of Food Preference Rating Scales 
with Various Numbers of Scale Categories 


Number of Rating Scale 
Categories 


Number of Subjects 52) 4t 52° GAG waRAS 
Test Reliability 61. .63 258; JORR 
Confidence Limits (.90) 
Upper 72 Th OF OE 
Lower 44 43° 39% a 
Rater Reliability 07. 33 AS ete 
Confidence Limits (.90) 
Upper 12 45 B5 aes 
Lower 103: 20) “e142 S 


there were a number of foods tied in rank on 
the scales with two and three categories a 
similar average rho was computed on the food 
rankings on scales with five, seven, and nine 
categories and was found to be .91. 


Discussion 


The results in terms of the test reliabilities 
is fairly unequivocal. No consistent trend 
was found in the relation of test reliability 
and number of scale categories. This sug- 
gests that Symonds’ (7) analysis does not 
hold for test reliability. It is interesting to 
note that the mean reliability found for the 
five scales, .625, is very similar to the esti- 
mate from Wallen’s data, .64. While the 
highest reliability, .70, was found with seven 
categories, the two lowest reliabilities, 58 and 
.60, were found with the immediately adjacent 
numbers of categories (five and nine). 

Rater reliability was not as regular as test 
reliability. The invariance of reliability over 
the range of five to nine categories that was 
found in a previous study (1) is here con- 
firmed. However, rater reliability rose at 
three categories and dropped for two cate- 
gories in this study. The drop at two may be 
attributable to the slightly different instruc- 
tions to the Ss with this scale. The slightly 
greater reliability with three categories can- 
not be explained by different instructions, al- 
though it must be pointed out that this re- 
liability is not much higher and, when tested 


40 A. W. Bendig 


statistically, is not significant. Before we 
can extend the conclusion of invariant reli- 
ability below five scale categories further in- 
vestigation will be necessary. 

It is interesting to note that the rater re- 
liabilities found for our list of 20 foods is 
somewhat less than that found for ratings of 
foreign countries (1, p. 39). This lower rater 
reliability for foods may be a function of the 
type of judgment required of the Ss, of the 
greater number of judgments required of the 
Ss (20 instead of 12), or of a greater homo- 
geneity among the 20 foods than was present 
among the 12 countries. 


Summary 


Ss (N = 236) rated 20 foods as to prefer- 
ence using rating scales containing 2, 3, 5, 7, 
and 9 categories. Test reliability (summed 
ratings for each S) and rater reliability 
(summed ratings for each food) were com- 
puted for each scale. Test reliability was 
constant over the entire range of categories 
and was very similar to reliabilities found in 
another study. Rater reliability was con- 
stant from five to nine categories, but was 
slightly lower at two and slightly higher at 
three categories. It was concluded that test 
reliability is independent of the number of 


scale categories, and that rater reliability is 
relatively constant, but that further research 
on rater reliability using short scales is 
needed before a similar generalization can be 
made regarding rater reliability. 


Received April 18, 1953. 


References 


1. Bendig, A. W. The reliability of self-ratings as a 
function of the amount of verbal anchoring 
and of the number of categories on the scale. 
J. appl. Psychol., 1953, 37, 38-41. 

2. Ebel, R. L. Estimation of the reliability of rat- 
ings. Psychometrika, 1951, 16, 407-424. 

3. Edwards, A. L. Experimental design in psycho- 
logical research. New York: Rinchart, 1950. 

4. Hoyt, C. Test reliability obtained by analysis of 
variance. Psychometrika, 1941, 6, 153-160. 

5. Hoyt, C. J. and Stunkard, C. L. Estimation of 
test reliability for unrestricted item scoring 
methods. Educ. psychol. Measmt, 1952, 12 

756-758. 

6. Kendall, M. G. Rank correlation methods. Lon- 

don: Griffin, 1948. 

. Symonds, P. M. On the loss of reliability in rat- 
ings due to coarseness of the scale. J. ¢x?- 
Psychol., 1924, 7, 456-461, r 

8. Thorndike, R.L. Personnel selection, New York: 
Wiley, 1949, 

9. Wallen, R. Food aversions of normal and neu- 
rotic males. J. abn. soc, Psychol., 1945, 40, 
77-81. 


THE JOURNAL ai e 
Vol. 38, No. ie PsycHoLocy 


The Inference of Accident Liability from the Accident Record 


Alexander Mintz 
City College of New York 


hae cree been known for a long time that acci- 
tial 1 lability * of people, that is their poten- 
fares” range accident rate, and the actual 
their er of accidents occurring to them, i.e., 
lated accident record, are imperfectly corre- 
the i This was already clearly implied in 
li assical 1920 paper by Greenwood and 
ieee on accidents. Newbold (9) pre- 
Soire] = 1927 a formula for estimating the 
dent i om between accident records and acci- 
that mn ility. Cobb (1) pointed out in 1940 
and B is correlation need not be high. Mintz 
publ; Tum (8) examined a large number of 
rie ished distributions and found that the 
all Mated variance of accident liability usu- 
Hen accounts only for a relatively small por- 
Far the variance of accident records, thus 
rming Cobb’s finding. Quite recently, 
Me (4) included in his summary of the 
Eia aea research on accidents tables and 
tre s implying the imperfect correlation be- 
tables accident liability and records. These 
ee and graphs utilize Greenwood and 
haste theoretical inference that for any one 
os degree of liability in people there 
ent be a Poissonian distribution of acci- 
abilit records, His table presents the prob- 
ilit y, for different degrees of accident lia- 
hae. and for different mean group liabilities, 
ete, person should have twice as many acci- 
In 7 the mean for the total group. |, 
ized T papers mentioned the notion 1S uti- 
tends Pik a given degree of accident liability 
accide: © result in a Poisson distribution of 
retical « records. This notion has many theo- 
ited 5 uses, but its practical usefulness 1s lim- 
individ, the fact that in the case of particular 
Pia the degree of accident liability 1s 
Prob bite unknown, so that the Poissonian 
ability distributions of accident records 


le“ 
THA ce: ven 
accident went liability” is a more general term than 
ia Givi ss because it includes both personal 
onmental conditions predisposing people to 


Accide 

nt: 

is ‘= Exact constancy of environmental haz- 
mally mo rd to prove, so that it is probably nor- 
rather pore, accurate to refer to accident liability 
an proneness. 


41 


cannot be arrived at. The accident record of 
individuals is often available. What is often 
needed is a procedure for estimating the un- 
known accident liability in terms of the 
known accident record. The main problem 
of this paper is: given a known distribution 
of accident records, and a particular accident 
record belonging to this distribution, how 
probable are the different assumed degrees of 
accident liability which may correspond to 
this particular accident record? 

In this general form, the problem has no 
answer. It will be treated here in terms of 
certain assumptions first explored theoreti- 
cally by Greenwood and Yule (3) and em- 
pirically by Greenwood and Woods (2). 
These assumptions were: 

(1) accident liability of people is not 
changed by accidents in which they are in- 
volved and does not vary with time; * and 
(2) accident liability varies among people 
and is distributed in some known manner, 
eg., in accordance with a Pearson Type III 
curve. 

These assumptions have not been definitely 
shown to be true, but they are fairly well sup- 
ported by available evidence, so that a fur- 
ther exploration of their implications is in 


order. 
The Solution 


The following considerations indicate the 
nature of the solution. Accident liability and 
accident record may be treated as two cor- 
related variables, the former as the independ- 
ent, the latter as the dependent variable. The 
distribution of accident liability is assumed 
to be known, or to be capable of being esti- 
mated from the data; so are the theoretically 
Poissonian distributions of accident records 
in the columns of the scatter diagram. To- 
nown. There are a number 


f precise characterization of 
have been discussed in the 


2 Or approximately kr 
of pitfalls in the way o0: 
accident records which 
literature. 

3 Actually, a somew 
cient, as has been poin 


hat weaker assumption is suffi- 
ted out by Kerrich (6). 


42 5 Alexander Mintz 


y 


gether these two types of information define 
a complete correlation surface; this correla- 
tion surface describes the probability distribu- 
tion of various possible combinations of acci- 
dent liability and accident record. There 
should be no difficulty in determining the dis- 
tributions in the rows of such a correlation 
surface. Such a distribution would indicate 
how probable are various degrees of accident 
liability in the case of a particular accident 
record, presupposing the assumptions of ac- 
cident liability being unaffected by the oc- 
currence of accidents and having a known 
distribution. 

The mathematical derivation of such a dis- 
tribution is presented below. It assumes as 
was suggested by Greenwood and Yule that 
accident liability is distributed in a Pearson 
Type III curve. In this particular case the 
answer is a very simple one: If the distribu- 
tion of accident liability for the whole group 
is of the Pearson III type, then the probable 
distributions of accident liability are also of 
the Pearson III type, but with changed con- 
stants in the formula. The changing of the 
constants results in changed means and stand- 
ard deviations which vary from those of the 
whole group, and also vary according to the 
accident record of the specific subgroups. 

h 


Mathematical Derivation- 


Poisson distribution: Probability of j acci- 
dents for group with accident liability \: 


ENNI 


j! 


where ¢ = 2.718--- (base of natural loga- 
rithms), y 


Pearson TIT distribution of accident liability: 


c? N 
—— er- 
T (p) 3 
where À is liability and ¢ and $ are constants 
related as follows to the mean (m) and variance 
. (v) of the distribution: m = 2, v= 2 i 
Greenwood-Yule derivation of negative bino- 
mial distribution: Probability of A liability and f 
accidents: product of formulae for Pearson III 


and Poisson distributions: 


ga EN cP Pe, 
——— p-chy pl ee i ey p— (e+) A) PTT y 
rA Xa" 


To determine the probability of j accidents for i 
all A-s, this expression has to be integrated over : 
all values of A, so that O<  < w- 


i Ta eTO PH TdN = 


E E E 
if (c Bes e 


c” o eTR dx 


IT) Jo (+ Dette 1 


( g yd eyri idy 

c+1/ ji (p)(e + 1y 
= (by definition of T-function) 

( c i r(p +j) 

c+1/ pI (p)(e+ 1) 
(general term of negative binomial distribu- 
tion). Ph, 
Derivation of probable distribution of accident, 
liability for given accident record: Probability 
of accident liability A and accident record j, 1” 


relation to all possible combinations of A — $ 
andj — s: H 


c? X 
(eH) PHI, 
JT (p) i 

say 
Probability of accident liability à and accident 
record j, in relation to the combined probability 


of combinations of this particular j with al 
A= & 


ef i¢ JA 
m eT (eH) PHL 
iT (p) 


( G )’ rp + j) ; 
e+1) jT Di 
cP 


+i vet 
= —— er let arti 
CFD 
(Estimated distribution of accident liability x 
corresponding to given accident record j. Jt 
distribution is one of Pearson’s Type III. 
has an equation of the same form as that giv® 
above for the Pearson III distribution, 7 
with changed constants; c+ 1 replaces 
$ + j replaces p.) 


Inference of Accident Liability from Accident Record 


The estimated Pearson III curves of acci- 
dent liability for subgroups with given acci- 
dent records may be interpreted in two ways: 
first, as representing the probable numbers of 
People with various levels of accident liability 


. M a subgroup with a given accident record; 


and, second, as representing the degrees of 
Probability of these various levels of liability 
for people with given accident records. Only 
the second interpretation is appropriate in the 
Case of small subgroups. 


Illustrative Results 


The accident distribution reported in Green- 
wood and Woods’ Table 8A was used for the 
computation of Pearson III curves as just ex- 
plained. This set of data was chosen for two 
reasons: (1) it can be closely approximated by 
the theoretical distribution derived from the 

Teenwood-Yule assumptions (the so-called 
Negative binomial distribution), which sug- 
8ests that these assumptions may hold true in 
this Case; (2) this set of data was suggestive 
of a higher correlation between accident rec- 
ord and liability than the other Greenwood 
and Woods sets of data (2). It was thought 
therefore that the demonstration of a rela- 
tively wide spread of probable accident lia- 

ility corresponding to a particular accident 
record would be particularly convincing. Ta- 
© 1 presents this set of data, together with 


the negative binomial distribution fitted by 


© method of moments. 
Figure 1 presents the three Pearson Ill 


Table 1 


Theoretical 


Bio: of (Neg. Binomial) 
m 


People 
0 8 8.7 
11 10.1 
8.9 
6.9 
5.0 
335 
2.4 
1.6 
1.0 
0.7 
0.4 
0.3 


BPDOMRNAUNRwWHe 


m 


Total 50 49.5 


43 


curves for the subgroups with zero, five, and 
eleven accidents (a subgroup of one). 

The curves show that a large group whose 
members have had five accidents apiece is 
likely to include some persons whose poten- 
tial accident records have a very wide range. 
There is actually some noticeable overlapping 
even between the probability curves of lia- 
bility of the two extreme subgroups with zero 
and eleven accidents. 

There is a very considerable amount of 
overlapping between the liability curve for 
the five-accident group and the other two. 

The estimated Pearson III distributions of 
accident liability for people with given acci- 
dent records enable one to estimate the com- 
bined probability of their accident liability 
falling within certain ranges, e.g., the range 
below the mean of the whole group or above 
twice the group mean, or from the first to the 
third quartile of the whole group. One can 
do this by integrating the expression for the 
Pearson III curve, or by using tables of the 
Pearson III integral (e.g. 10). 

Table 2 presents the results of such a pro- 
cedure for two published distributions of ac- 
cidents. These distributions are that in 
Greenwood and Woods’ Table 8A and that 
of 29,531 Connecticut car drivers discussed 
by Cobb (1). The figures represent the prob- 
ability that the true accident liability of a per- 
son with a given accident record is below the 
mean‘ of the whole group. The two means 
were 2.8 accidents per person and .24 acci- 
dents per person for the Greenwood-Woods 
set 8A and for the Connecticut drivers, re- 
spectively. 

In the Greenwood-Woods’ set of data, a 
person who had no accidents has 95.2 chances 
in a hundred of having accident liability be- 
low the mean of the whole group. For peo- 
ple with I accident, the probability of acci- 
dent liability below the group mean of 2.8 is 
85.7 per cent, and so on. Similar statements 
can be made about the Connecticut car 
drivers. It should be noted that 95 per cent 


4 ubsequent discussions, there are references 
to Jo te ability of accident liability being either 
above or below the group mean. This is done for 
the sake of simplicity ; the infinitesimal probability 
of accident liability being exactly at the group mean 


is disregarded. 


44 


0 accidents 


250 
+40 


30 5 accidents 


4 2 3° 4 5 6 
Possible dogreas of accident liability 


Fic, 1. Pearson Type III curves (y = - 
Yp (y VE 


butions of different degrees of accident liability for given accident records. 


and Woods, Table 8A. 


certainty about accident liability being greater 


than that of the group mean is not reached - 


until the person has had 8 or more accidents 
in the Greenwood-Woods’ sample, 3 or more 
accidents in the sample of Connecticut driv- 
ers. These accident records, are reached by 
only a few persons—5 out of 50, or 10 per 
cent in the former case, 51 out of 29,531 or 
-17 per cent in the latter case. 


Table 2 


Probability (in Per Cent) of Accident Liability 
Below the Mean of the Whole Group 


Number of — Greenwood-Woods Connecticut 
Accidents Table 8A Drivers 
0 95.2 76.6 
1 85.7 40.9 
2 70.6 15.9 
3 52.6 4.9 
4 33.0 11 
5 20.9 0.3 
6 {1.3 0.1 
7 5.6 
8 2.5 
9 14 
10 0.4 
ll 0.1 


Alexander Mintz 


11 accidents 


10 


11 12 13 


e CHDN Pti) representing the probability distri- 


Source of data: Greenwood 


Discussion 


The immediately preceding statements 
should not be confused with the customary 
statements of the level of statistical signifi- 
cance. If one states that a finding is signifi- 
cant at the 5 per cent level, one means that, 
if the null hypothesis is assumed to be valid 
for the population, deviations as great or 
greater than the one found are expected to be 
found in only 5 per cent of the samples. The 
Statement, “the probability of accident lia- 
bility below the group mean is 5 per cent m 
people having 3 accidents” does not presup- 
pose an assumed null hypothesis and is not 
intended to be a test of a null hypothesis: 
On the contrary, it presupposes the existence 
of differences in accident liability and charac- 
terizes the probable percentage of below-av- 
erage liability among people who had 3 acci- 
dents each. 

Taken at their face value, the figures i? 
Table 2 exhibit the way in which accident 
liability below the group mean becomes less 
probable and accident liability above the 
mean becomes more probable in the case 0f 
persons with the larger numbers of accidents- 
Clearly, the accident records have some Va 


Inference of Accident Liability from Accident Record 45 


lidity as information about accident liability. 
In the Greenwood-Woods set of data, 5 or 
Pe accidents mean accident liability above 
he group mean in at least four cases out of 
In the Connecticut drivers’ sample, 
tivers who had 2 or more accidents can be 
expected to have accident liability above the 
group mean in more than 5 cases out of 6. 
On the other hand, relative certainty that a 
ga individual has accident liability above 
t e mean of the whole group can only be 
ia in a very small number of people. 
Snip er one chooses to emphasize the fact 
oe accident records have some validity as 
thei range predictors, or the limitations of 
ae validity is presumably dependent on 
s scientific level of aspiration. 

Should figures be taken at their face value? 
he answer depends on whether the Green- 
Wood-Yule unequal liability assumptions are 
to be accepted. The principal evidence on 
heir validity seems to be as follows: 


1. The negative binomial distribution which, 
as theoretically derived by Greenwood and 
was based in part on these assumptions 
Rt obtained accident distributions very 
usual] Newbold (9) showed, in effect, that it 
retia y fits them better than any other theo- 
a distribution embodying the ideas of 
a accident liability in people and un- 

nged accident liability after accidents. 

b 2 The negative binomial distribution can 
© derived by the use of different sets of as- 
Sumptions and therefore does not differentiate 
s tween them and the Greenwood-Yule as- 
n mptions, Thus Irwin (5) showed that the 
oe binomial distribution is to be eX 
ah if there are no initial differences 1n 
accident liability and that accident liability 
eee increases as a linear function of 
a dents. Lundberg (7) quotes rather simi- 
t deductions by Polya and Eggenberger- 

Re There are a few available sets of acci- 
Saa for the same people during Gra 
an en periods. In terms of the Rect 
s wore the accident rate of people 
iit In terms of the as- 
Pd explored by Irwin they should in- 
by a _ According to the evidence presented 
twin and by Kerrich (6), the accident 


rates vary only slightly with time, and tend 
to decrease rather than to increase. 

In terms of the evidence presented, the ma- 
jor inferences from the Greenwood-Yule as- 
sumptions appear to be in accord with avail- 
able data in many cases. However, more re- 
search is needed, particularly in view of the 
scanty available evidence on accidents in suc- 
cessive periods. There are theoretical con- 
siderations making the exact truth of Green- 
wood and Yule’s assumptions of unchanged 
liability after accidents rather unlikely. Nev- 
ertheless, the available evidence strongly sug- 
gests that in the cases in which the negative 
binomial distribution fits the data the Green- 
wood-Yule assumptions may be viewed as ap- 
proximating the truth. The inferences from 
these assumptions pertaining to the probable 
degree of accident liability which may corre- 
spond to given accident records then may be 
tentatively accepted as approximately true in 


many cases. 
Summary 


The classical assumptions of unchanged ac- 
cident liability after the occurrence of an ac- 
cident were provisionally accepted. Certain 
further implications of these assumptions were 
explored. The assumed distributions of acci- 
dent liability in groups of people were broken 
up into probable component distributions of 
liability for subgroups with given accident 
records. These component distributions were 
found to have the same form as the total dis- 
tribution if the latter is of type III. Quanti- 
tative examples of applications of this fnd- 
ing were given. Tt was pointed out that acci- 
dent records have some validity as indicators 
of accident liability, but that relative cer- 
accident liability of par- 
be achieved in terms of 
ds only in a small mi- 


tainty about high 
ticular persons can 
their accident recor 
nority of cases. 


Received April 20, 1953. 
References 


1. Cobb, P. W. The limit of usefulness of acci- 
dent rate as a measure of accident proneness. 
J. appl. Psychol., 1940, 24, 154-159. 


46 Alexander Mintz 


2. Greenwood, M. and Woods, H. M. The inci- 
dence of industrial accidents upon individuals 
with specific reference to multiple accidents. 
Industr. Fatigue Res. Bd. Report 4, 1919. 

3. Greenwood, M. and Yule, G. U. An enquiry 
into the nature of frequency distributions rep- 
resentative of multiple happenings, with par- 
ticular reference to the occurrence of multiple 
attacks of disease or of repeated accidents. 
J. Roy. Statist. Soc., 1920, 83, 255-279. 

4. Hughes, H. M. Discriminatory analysis. III. 
Discrimination of the accident prone indi- 
vidual. USAF School of Aviation Medicine. 
Project Report number 21-49-004. Report 
number 3. Oct. 1950. 

5. Irwin, J. O. Comments on paper by Chambers, 
E. G. and Yule, G. U. J. Roy. Statist. Soc., 
1941, 7 (suppl.), 101-107. 


6. Kerrich, J. E. The mathematical background. 
In Arbous, A. G. and Kerrich, J. E. Accident 
statistics and the concept of accident prone- 
ness. National Institute for Personnel Re- 
search, No. 391, 1951. South African Council 
for Scientific and Industrial Research. 3 
- Lundberg, O. On random processes and their 
application to sickness and accident statistics. 
Uppsala, Almquist and Wiksell, 1940. 
8. Mintz, A. and Blum, M. L. A re-examination of 
the accident proneness concept. J. appl. Psy- 
chol., 1949, 33, 195-211. 

9. Newbold, E. M. Practical applications of the 
statistics of repeated events. J. Roy. Statist. 
Soc., 1927, 90, 487-547. 

10. Pearson, K. (editor). Tables of the incomplete 
Gamma function. London, H. Maj. Station“ 
Off., 1922. 


Tue Journ, 
Vol. 38, Nese PsycHoLocy 


The Development of Criteria of Safe Operation for Groups ` 


Harry Waller Daniels and Harold A. Edgerton 


Richardson, Bellows, Henry 


AS Susie a task of the psychologist, in in- 
eee ions of relationships between various 
grou S of the operation or management of 
An . and the effectiveness of the groups, to 
When ae way of measuring this effectiveness. 
atten group effectiveness is the goal, these 
dife at measurement can be extremely 
mark . Many times they fall short of the 
Mana and do not really measure what the 
ec, the groups would themselves call 
iveness, 
eee study of the relationship of various 
fecti gement factors to the relative safety ef- 
a veness of Army motor vehicle units, done 
è er contract with the Department of the 
rma , the present authors and their col- 
ead at Richardson, Bellows, Henry and 
teri pany came to grips with this knotty cri- 
ion problem. 
re primary objective of the study was to 
a Rene the relationship, if any, between 
man, riving safety of motor vehicle units, and 
in ee and supervisory practices used 
Danes Units. The orientation of the present 
to a r is limited to a summary of the attempts 
define and measure the criterion variables. 
although few psychologists have done in- 
dere work in this area, the history of re- 
and t on accident proneness, safe driving, 
ee problems is long and varied. ; Two 
and “pe reviews are available: Johnson’s (1) 
view. awshe’s (2). The authors of these re- 
ot that few investigators have dif- 
rivi iated between driving skill and safe 
Chines and that the early investigators Con- 
ele on “simpler” functions like depth 
abilia? rather than “higher” functions like 
mE attitudes, etc. 
its 3 research presented in this paper has as 
ë me oe the safe operation of a motor 
e unit, rather than the more usual ori- 


entat;, 

ation of skill in driving. Thus, the first 
ar a5 ; 

article opinions and conclusions expressed in this 


sarily E those of the authors; they do not neces- 
or the «rect official Department of the Army policy 


hi 
© views of anyone other than the authors. 


47 


& Co., Inc., 1 West 57 Street, New York 19, N. Y. 


question to which efforts were directed was: 
How could the units which were “high” and 
those which were “low” in safety of operation 
be properly identified so that another ob- 
server using the same procedures would ar- 
rive at the same identification? 

Preliminary investigation of accident rates 
of motor vehicle units led to the conclusion 
that, for the purposes of this study, such data 
were inadequate. Differences in definition of 
a reportable accident, in accuracy of mileage 
estimates, in mission, in equipment, traffic 
conditions encountered, etc., were factors in- 
volved. Added to this was the statistical un- 
reliability of reported accident rates for, say, 
50 vehicle motor units. 

Such information and evidence led toward 
ratings of safety of performance as a method 
of identifying units for further study. In ad- 
dition, it was not feasible to restrict the study 
to motor vehicle units which were fairly com- 
parable in organization, equipment, mission, 
and conditions of operation. To produce 
really useful results, the study had to en- 
compass motor units as they occurred rather 
than as one might like to have them set up 
for a “tight” experimental design. 


Procedure 


The forms used were constructed on the 
basis of the preliminary surveys and the field 
tryout. The criterion procedure was as fol- 


lows: 

Relative ratings of over-all safety of operations 
of all motor vehicle units in an installation were 
asked for. Criterion rating sessions were held, 
attended by post oF divisional staff officers, 
Provost. Marshals, Safety Officers and Directors, 
and other persons who would have an acquaint- 
ance with the comparative performances of the 
motor vehicle units at the jnstallation. Motor 
officers and the sergeants of the individual motor 
units did not attend these sessions but were asked 
to fill out the rating forms on their own units at 
the time of intensive study (not reported here) 
of their own units. 

In the criterion rating sessions three forms 
were filled out by the participants after pertinent 


48 H. W. Daniels and H. A. Edgerton 


instructions. Since these meetings were informal 
gatherings, it was possible for the RBH field rep- 
resentative to monitor the rating procedures of 
the participants and thus ensure that the instruc- 
tions were being followed. Discussion of units 
being rated was not permitted; but otherwise 
conversation flowed freely during these sessions. 
The forms used were as follows: 

The Familiarity Rating Form (CRT-29, RBH 
Form R212-J) was constructed for the raters to 
indicate how well they knew each of the motor 
vehicle units at a particular installation. This 
form was designed to overcome the objections 
frequently made by those officers that they could 
not judge the over-all safety of a motor unit be- 
cause they were not sufficiently familiar with it. 
The form was used to identify the motor vehicle 
units with which they were best acquainted. 

While the use of the Familiarity Rating Form 
did not completely overcome their reluctance, it 
seemed to relieve some tension in the criterion 
rating sessions. The form merely listed the mo- 
tor units in the post or division, and the officers 
were instructed to rate their familiarity with the 
units as follows: 


“O"—if unfamiliar 

“1?—if slightly familiar 

“2"—if familiar with the unit’s personnel, 
driving, and other factors to be rated. 


The Safety Factors Rating (CRT-26, RBH 
Form R212-G) was then given to the group 
members who were asked to rate, on 16 aspects 
of over-all safety, the six units with which they 
were most familiar. The raters who did not 
know six units well enough to rate them rated 
only those with which they had indicated fa- 
miliarity. 

The Criterion Ranking Form (CRT-27, RBH 
Form R212-H) was then administered. On this 
form the raters identified in order, from those 
with which they were familiar, up to six units 
which they thought were “best” from a viewpoint 
of all-around safety, and up to six units which 
were “worst” in all-around safety. 

From the analysis of these forms it was pos- 
sible to select a number of “high” safety and 
“low” safety units from each post or division. 
In many cases, upon further acquaintance with 
the unit (e.g., the Xth Ordnance Battalion), it 
was found that the unit which was selected as 
high or low really contained more than one motor 
unit (e.g, Companies A, B, C., and Hq.). In 
such cases, the Battalion, Regimental, or Group 
staff went through the criterion procedures as 
outlined above for the motor units under their 
cognizance. Company or Battery level motor 
units were selected from these ratings, with this 
limitation: only the units rated lowest were se- 
lected from Battalions, Regiments, or Groups 
previously rated low, and only the highest were 
selected from units previously rated high. 


A Check on the Criterion Groups 


In spite of the experience gained early in 
the study at various Army installations, which 
showed that the usual accident and mileage 
records were unsuitable for our purposes, an 
attempt was made to provide an objective 
criterion measure of this general type. 


It had originally been planned to collect acci- 
dent frequency statistics for the units. Review 
of unit safety records during the pilot field study 
had indicated that accident frequency statistics 
were inadequate to permit differentiation among 
relatively safe and unsafe motor units. The re- 
sults of this trial study, however, showed that 
many incidents, which could be construed as Te 
lated to the safe operation of motor units, wete 
not being reported as accidents. To utilize this 
information, the Vehicle Damage Report (CRT- 
30, RBH Form R212-K) 2 was devised. It was 
hoped that this form would provide a higher ae 
gree of objectivity and serve to substantiate or 
refute the selection of the units on the basis 0f 
the ratings and rankings. Data on the, m 
quency of damages occurring within individu 
units, as an empirical measure of their safety ir 
operation, were collected. The report was esse! 
tially a list of approximately 50 damages whic 
could occur to a vehicle as a result of an impac™ 
These were compiled and divided into nine ger 
eral areas (e.g., Bumper Assembly, Body-Frow 
Body-Sides, Wheel Assembly, eic.). The 1S 
was further subdivided into types of vehicles. 

The list was administered as a group intervie’ 
with the units’ motor sergeants, motor © me 
mechanics, etc. Copies of the list were hanes 
to each member of a group so that the !!*° 
could serve as a stimulus to recognition and es 
call of damages which had occurred to the unit® 
vehicles during the preceding calendar mon,” 
They were encouraged to use whatever regon 
they had available, and to consider each vehi 
separately, one at a time. 


Analysis and Final Criterion Groups 


From two division and four post headqu3" 
ters, 93 motor units were rated by vary" 
numbers of raters. From these 93, 16 “hig 
units and 16 “low” units were chosen. fe 
analysis of these criterion measures and how 
they were used in making the choices a 
given below. on 

The scoring for the Safety Factors Rati | 
was based on results of the preliminary 
study at different installations in which 


a 

* The authors are indebted to Mr. Warren R- ore 
ham of Richardson, Bellows, Henry and Comb yl: 
for the development of the Vehicle Damage Re 


Development of Criteria of Safe Operation for Groups 49 


oo key was derived and shown to be 
a to the ranking of motor units (7 = 
i Using all of the 93 rated units from 

present sample, this empirical scoring was 


Table 1 


Means and Standard Deviations of Scores for the 
High, Low, and Total Groups for Criterion 
Rankings and Safety Factors Ratings 


fg dele related to the rankings, 7 = .83. 
able thet relationship may reflect consider- 
i a oi but as far as one can rely on the 
Farai o e ratings in the criterion rating 
ate the ‘a is consistency SEENES to substanti- 
are a ee of criterion units which 
KES ae low or high. Whenever there 
fore ba disagreement between two or 
ae Wes z qualified raters as to whether 
cluded a n igh or low, the unit was not in- 
Ses eae he final criterion groups. When- 
average e was inconsistency between a unit's 
ings F score on the ratings and on the rank- 
, the unit was not included in the final 
groups, 
— field utilization of the Safety Factors 
which ny the Criterion Ranking forms, by 
Dorte the selection was made, had certain 
rater a It was impossible to have each 
en ty all the units—which would have 
io ae most desirable procedure—because 
Units t rs were sufficiently familiar with the 
tent o do so. Tt was a fortunate but infre- 
the Instance when three raters could rate 
hick unit, For this reason, many units 
Rae on the basis of rating and ranking 
a To would appear to be definitely “Jow 
so ‘et were rated by only one person, and 
ike to be discarded in favor of other units 
Uti two or more raters had agreed on the 
Th relative position in the installation. 
fiel ae research team discovered early in the 
ed — that some criterion raters consid- 
than themselves to be more or less qualified 
fone other raters. Therefore, the qualifica- 
expre of the raters, as they were informally 
also e during the criterion sessions, were 
Fig into account in selecting units, 
devine when there was a question about wide 
iér ions in the scores of the units. A fur- 
a fhe edn in the selection of units 
ices ensive study of administrative prac- 
ome reported in this paper) was their 
inds fp in terms of number and 
Special t vehicles, missions of the units, and 
unctions or hazards. 


Rankings* Ratings 
Groups M o M o 
High (16 units) 37.9 5.1 10.1 23 
Low (16 units) 23.1 5.2 2.5 5.2 
Total (93 units) 30.5 13.9 6.5 4.5 


* Each unit’s rankings (by one or more persons) 
converted to a standard score scale that has a mean of 
30 and a g of 10. 


The means and standard deviations of the 
scores of the finally selected criterion groups 
are shown in Table 1. 

The mean rating and ranking scores of the 
93 units and related data are given in Ta- 
ble 2. 

In the selection of units, where a choice 
was possible, varied units were selected. This 
was done so as to include in the sample as 
many differently structured units with differ- 
ent missions as possible. Over-all, however, 
the selected high units and low units were 
similar. 

For the high group, 9 were post units and 
7 divisional units. For the low group, 10 
were post units and 6 divisional units. The 
breakdown (Table 3) shows the make-up in 
more detail. 

It seems to be established, therefore, that 


the selected high and low groups, while quite 
similar in types and numbers of vehicles, 
izes of units, and mis- 


numbers of drivers, sl 
sions, were in reality different, presumably 


in terms of performance. 

Since the Safety Factors Rating was used 
by various personnel at the criterion sessions, 
it was desirable to see if there were any sig- 
nificant differences in the way in which vari- 
ous groups rated. In other words, were the 
ratings as a whole homogeneous? This ques- 
tion resolved itself into the testing of the 
hypothesis that the ratings of four groups of 
raters were random selections from the same 
universe. The four groups in question were: 
Group A. Provost Marshals, Safety Officers, 
and Directors; Group B. Post Ordnance, 
Maintenance, Transportation and Motor Of- 


H. W. Daniels and H. A. Edgerton 


Table 2 
Mean Ratings of Units by Higher Echelon Criterion Raters 


Mean Mean € 
Unit Rating Rank** Selected Unit Rating Rank Selected 
1* —2 33 48 7 22 
2% —1 27 49 11 40 
3* -1 18 50 2 21 
4 13 38 51 11 36 
5 7 30 52 10 33 H 
6 11 31 53 10 43 H 
a 10 39 54 8 25 
8 3 24 55 8 24 
9 il 40 H 56 8 24 
10 10 33 H 57 4 28 
11 7 25 58 9 34 
12 7 25 59 10 39 H 
137 5 40 60 4 25 
14* 1 43 a 61 9 38 
15* 3 35 62 2 18 L 
16 4 20 L 63 8 31 
Wha 2 25 64 8 31 
if 4 30 65 6 35 
A 9 35 66 6 24 
2 9 25 67 5 26 L 
21 14 38 ; 68 6 36 
22 12 19 b 69 10 47 H 
23* 10 25 70 11 47 H 
a pi 18 L 71 2 27 
25 9 39 H 72 6 33 
26 9 30 73 7 sa 
27 10 41 H 74 6 34 L 
28 11 32 75 S a 
A k us I 76 1 26 
30 7 22 7 2 ae 
31 2 25 
3 ; E 19 5 33 H 
33 8 34 80 > = = 
r 7 81 9 36 H 
35 8 24 È 82 Ps a A 
> 5 9 83 4 26 L 
37 None = È 84 3 S 4 
38 13 34 85 w a i 
39 0 19 86 i0 3 = 
40 5 29 87 13 be H 
41 3 24 E 88 12 33 
42 5 36 L 89 9 a 
43 3 29 L 90 12 4 
44 9 27 91 15 i 
45 2 25 92 4 a 
46 9 33 93 4 = 
47 4 39 


* Only one rater rated the unit, 
‘onverted to standard scores as in Table 1, 


Development of Criteria of Safe Operation for Groups 51 


Table 3 
Kinds of Units in the Sample 
l High Low 
Kind of Motor Unit Group Group 
Car Company 2 pry 
Truck Company 1 3 
Ordnance Company 3 oF 
MP Units 3 1 
Engineer Unit 1 2 
QM Company 1 4 
Signal Company 1 a 
FA Battalion 1 3 
HQ Company 1 iz 
Administrative Motor Pool 2 2 
Ambulance Company = 2 
Heavy Tank Battalion = 1 
1 


Antiaircraft Battalion a 


ficers; Group C. Staff Officers and others not 
classed in Group A, B, or D; and Group D. 
Unit Motor Officers and NCO’s. 

Because of the possibility that there might 
be a difference in the homogeneity of ratings 
among the groups when rating high units as 
Opposed to when rating low units, it was de- 
cided to analyze the ratings of high units 
Separately from the ratings of low units. 
Analysis of variance was employed to test 
the above hypothesis. Of 16 items tested, 
F ratios for low units were significant at 
P <.01 for ten items; for high units, only 
two items were rated differently by the 4 
groups. , 

A second item analysis of the Safety Fac- 
tors Rating was made in which the responses 
of the motor officers and motor sergeants 
(Group D) were compared with all of the 
criterion raters’ responses together (Groups 
A, B, and C). The results of this analysis, 
using Strong’s method (3) to obtain response 
Weights, and testing for significance with chi 
Square, showed that a significant difference 
existed between the higher echelon officers 
and the motor unit leaders on each of the 16 
items. 

It seems, therefore, that the motor officers 
and sergeants do disagree with the criterion 
Taters in rating their own units. Linked with 
the results of the item analysis of variance, 
this means that motor officers and sergeants 
tate their units differently than do the higher 


echelon officers (criterion raters). This was 
noticed on inspection of the response fre- 
quencies of the two groups: the criterion rat- 
ings are consistently lower than the ratings 
by the motor units own personnel. It is 
likely that this difference is due to a typical 
overrating of one’s own organization which 
might be expected from the motor officers 
and motor sergeants. They are unable to 
place realistically their own unit in the con- 
text of units at the installation. This differ- 
ence in groups, however, is much more evi- 
dent in the ratings of low units than in the 
ratings of high units, since the analysis of 
variance showed 10 of the 16 items to be 
rated differently by the groups in rating low 
units, opposed to only two items when rating 
high units. 

Leaders of low units, both officers and 
NCO’s, are less able than high unit leaders 
to place their unit realistically relative to the 
other motor units on the post with regard to 
the over-all safety of the unit. 

In order to determine which of the factors 
rated on the Safety Factors Rating seemed to 
be differentiating between high and low units, 
a further item analysis was made, in which 
the responses of the motor officers and motor 
sergeants from the high units were compared 
with those from the low units. For this com- 
parison to be made, it was necessary to com- 
bine the ratings of NCO’s with those of the 
motor officers. This was possible since the 
functions of the two groups were closely in- 
tertwined in the administration of motor 
pools. The item analysis comparing the re- 
sponses of all motor sergeants with all motor 
officers (rating their own units) showed that 
statistically there was no reason to suppose 
that the former group had rated their units 
differently than the latter group. 

The item analysis of motor officers’ and 
motor sergeants’ responses comparing high 
and low groups showed no marked differences 
in answering the questions. Essentially, this 
means that the motor officers and motor ser- 
geants rate their own units the same regard- 
less of their unit’s relative position in over-all 
safety as rated by the higher echelon officers. 
This corroborates the analysis of variance re- 
sults reported above, in which it was seen that 


H. W. Daniels and H. A. Edgerton 


Table + 


Percentage of Vehicles Damaged per Vehicle Operated in Period 


High Low 

Units Units Units Units 

Number Number Per Cent Report- .Oper- Number Number Per Cent Report- Oper- 
Vehicle Oper- Dam- Dam- ing ating Oper- Dam- Dam- ing ating: CR 
Type ators ages ages Damages Vehicles ators ages ages Damages Vehicles _. 
3} ton 419 26 6.2 6 14 337 25 74 10 15 a 
} ton 61 7 11.5 2 13 71 6 8.5 3 13 579 
1} ton 73 0 0 0 7 70 20 28.6 1 7 w 
2} ton 297 5 5.1 7 15 312 19 6.1 10 16 p 
Sedans 101 14 13.9 3 4 50 8 16.0 2 3 202 

Misc. 78 11.5 3 12 128 28 21.9 5 13 Y 

6 
Total 1,029 71 6.9 11 16 968 106 11.0 14 16 3.10 


the unit leaders consistently rate their own 
units high on the Safety Factors Rating, re- 
gardless of where the criterion raters place 
the unit. 

The Vehicle Damage Report was analyzed 
to determine if the numbers and kinds of 
damages, as recalled and reported in an in- 
terview situation by the units’ mechanics, 
would show differences between high and low 
units. 

The numbers of damages incurred by the 
high and low groups were summed separately 
by types of vehicles. The numbers of dam- 
ages by type for each criterion group were 
equated to the number of vehicles of that 
type operated and maintained. In addition, 
the numbers of damages were adjusted to the 


number of trips made in one month. The rè- 
sults are summarized in Tables 4 and 5. ] 
Table 4 shows a significant difference, be- 
tween the percentages of 14%, ton vehic 3 
damaged in the high group: none, as an 
pared to 28.6% damaged in the low criterion 
group. It should be noted, however, ma 
only one unit (which reported 20 acciden 3 
to vehicles of this type) caused this sigri 3 
cant difference. The difference is seen als 
when relative amount of use is considered i 
adjusting damages to the number of Å: 
made during the period by 114 ton vehic 
(Table 5). be- 
A second significant difference occurs in 
tween the percentages of vehicles damaged mi 
the miscellaneous category (heavy engine? 


Table 5 
Percentage of Vehicles Damaged per Trip Made during the Preceding Month 4 
— = n- 
High Low 
N Units Units Units Jnits 
N j Number Per Cent Report- Oper- Number Per Cent Report- Oper- 
Vehicle Number Dam- Dam- ing ating Number Dam- Dam- ing ating ck 
Type Trips ages ages Damages Vehicles Trips ages ages Damages Vehicles ee 
A — 69 
1 ton 6,007 26 A 6 14 3,579 25 7 10 15 103 
2 ton 993 7 8 2 13 572 6 4s à 13 453 
1} ton 836 0 0 0 7 980 20 2.0 1 7 1.89 
2hton 3,916 15 4 7 15 2,536 19 8 10 16 26 
Sedans 1,382 14 1.0 3 4 886 8 9 3 56 
Misc. 893 9 1.0 3 12 2,252 28 1.2 5 13 ai 
+ 4 
Total 13,937 71 $ 11 16 10,805 106 1.0 14 16 


Development oj Criteria of Safe Operation for Groups 53 


ing equipment, ambulances, wreckers, trail- 
ers, etc.). When relative use is considered, 
however, the significance of this difference 
disappears, since the low groups use these ve- 
hicles more frequently than do the high 
8roups. 

When the difference between percentages 
of all types of vehicles operated during the 
— is considered (Table 4), it is found to 
Oe eon Moreover, this difference re- 
a ins significant when the relative frequency 

use is considered (Table 5). 
Rees results are interpreted to mean that 
Dae Ow criterion units are relatively unsafe as 
the pared to the high units. The ability of 
wae echelon officers to make criterion 
tit ings and ratings in terms of safety of 
a Operation is substantiated and it is con- 

uded that the subjective criterion has real 
Validity, 


Summary 


en development of criterion measures of 
iane of operation for groups reported in this 
Previ proceeded from a consideration of 

Ous measures reported in the literature, 


to utilization of rating and ranking pro- 
cedures to obtain preliminary criterion groups 
of motor vehicle units. The criterion was 
not accepted as valid, however, until an in- 
vestigation of damages showed a relationship 
to the preliminary grouping of units. It is 
the authors’ opinion that criteria derived 
from ratings or rankings should be verified 
by showing them to be related to some criti- 


‘cal behavioral aspects of effectiveness, ac- 


ceptable to the psychologist, to the raters, 
and to the groups being studied. 


Received April 9, 1953. 


References 


1. Johnson, H. M. The detection and treatment of 
accident-prone drivers. Psychol. Bull, 1946, 
43, 489-532. 

2. Lawshe, C. H., Jr. A review of the literature 
related to the various psychological aspects of 
highway safety. Purdue University Engineer- 
ing Bulletin, 1939, 23, 2a, Lafayette, Ind., En- 
gineering Experiment Station, Purdue Univer- 
sity. 

3. Stead, W. H., Shartle, C. L., and Associates. Oc- 
cupational counseling techniques. New York: 
The American Book Co., 1940, Appendix VI, 


253-255. 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 1, 1954 


Visual Acuity Measurements by Wall Charts and Ortho-Rater 
Tests * 


D. A. Gordon, J. Zeidner, H. J. Zagorski, and J. E. Uhlaner 
Personnel Research Branck, TAGO, Dept. Army, Washington, D. C. 


Recently several instruments involving opti- 
cal simulation of distance have been devel- 
oped for large scale acuity testing. Among 
such devices are the Bausch and Lomb Ortho- 
Rater, the Keystone Telebinocular, and the 
American Optical Company Sight-Screener. 
These instruments provide means of present- 
ing tests of right eye, left eye, and binocular 
acuity, as well as vertical and horizontal 
phoria, stereopsis, and color vision. Both 
near and far simulated distances may be used. 

For the measurement of far visual acuity, 
optical instruments have several advantages 
over the usual method of wall chart or alley 
testing. The light source’ is “built in” and, 
therefore, can be made relatively accurate. 
Alley charts, on the contrary, vary widely in 
conditions of illumination. The viewing dis- 
tance of instruments is achieved optically, 
with consequent economy of testing space. 
Targets may be conveniently changed with- 
out crossing the testing room. And of course, 
a variety of visual functions may be tested 
on the same instrument. 

Before any one of th 
can be considered seriously for extensive 
visual testing, it should be compared with 
wall chart presentation, This study should 
be made on the basis of relative difficulty, re- 
liability, and similarity of functions meas- 
ured. The present Paper deals with these 
problems; a comparison is made between 
acuity scores on wall charts and on the 
Bausch and Lomb instrument test, 


ese new instruments 


Review of Literature 


The reliability of wall chart tests of far visual 
acuity has been determined (2). Data are also 
available on the reliability of instrument tests 
(1, 3). A rigorous comparison between these re- 
liabilities cannot be made because of differences 
in the Populations, test targets, and light levels 
employed. A study by Sulzman, Cook and 


* Any opinions expressed herein are those of the 


authors and do not necessarily reflect those of the 
Department of the Army, 


Bartlett (6) did employ the same sae af 
comparing thé reliabilities of instrument a t 
wall chart tests. The instruments they emp ae 
included the Sight-Screener, the Ortho-Rall@ 
and the Telebinocular. It was found that ad 
reliabilities of the letter wall chart tests este 
about the same as those of the instrument te 
They ranged from .80 to .88 for the two “he 
chart tests, and between .81 and 85 for ity. 
three instrument tests. In near visual Bes 
testing, reliabilities were also similar. The TA 
charts, however, seemed to be testing a Vi iv: 
function somewhat different from that of the et- 
strument tests. The correlation between the a 
ter wall chart tests was considerably higher ost 
that between wall and instrument tests. If tioni 
correlations had been corrected for atten or 
the difference would be even larger. The are in 
conclude that these results may be due to th the 
troduction of some new factor related tO fact 
optical system of the instrument or to thee 
that different targets are used in the various yer 
Altman and Rowland (3) determined the o 
lationship between scores obtained on an ie 
Rater and a wall chart when the same target rge 
used. The wall chart was an accurate enlatton 
ment of the plate reproduced for presenta es 
at 20 feet. One hundred and fifty-seven ©? iy 
were tested without refractive corrections e 
order to secure a wide range of acuity SC ity 
A correlation of .94 was obtained between ake 5. 
Scores on the Ortho-Rater and wall chart he 
This study presents supporting evidence ©” ine 
identity of the visual abilities measured by 
two methods. was 
- In the present experiment, an attempt and 
made to compare the test-retest reliabilities “ 
to obtain a measure of the correspondence rt 
tween scores on Ortho-Rater and wall Sight 
tests. The same subjects, targets, an res” 
level were employed in both methods of Peon’ 
entation. The conditions of luminance ane -gl 
trast between object and background were ones 
ized as closely as possible, With control of 
conditions, more definitive conclusions may 
haps be reached concerning the reliabilities © 
two presentation methods, 
absence of the 
tor thought by 


er 
the 

cea 
n and the presen, yac 
apparatus accommodation 


some to affect machine scores 
Method and Procedure t 


a 


(5): 


The present experiment was conduct 
the Personnel Research Branch’s Penta? 
Laboratory in Washington, D. C. 


Visual Acuity Measurements 55 


Army Snellen 
Fic. 1. 


yjube subjects were 117 soldiers from Fort Myer, 
Virginia, Soldiers varied in age between 19 and 
37 years, with the mean age at 22.4 years, and 
a standard deviation of 2.6 years. The test tar- 
gets were observed binocularly. All subjects who 
Customarily wore corrective lenses used them in 
the experiment. 

wenty examinees who either reported having 
trouble with their eyes such as irritation, water- 
ing, or fatigue, or who reported having driven 
the’ night before, were considered as a special 
group. The decision was made to include the 
Tesults of this group in the analysis after no sig- 
nificant differences in estimates of reliability were 
found between this group and the soldiers re- 
Porting no eye trouble. 

The test designs included a letter chart and a 
modified Landolt ring chart. Samples of the 
items in these targets are shown in Figure 1. 

_ The letter chart was a modification of the 
Snellen chart employed by the Army in routine 
Visual acuity examinations. Items were added 
io Bive more adequate discrimination where too 
T items and sizes were found on the old test. 
The chart consisted of 12 lines of letters ranging 
in size from 20/100 to 20/7.1 Snellen. The 
modified Landolt ring chart presented a square 
ances rather than the circular design used in the 
ee ring. The chart contained 11 lines of 

€ms, ranging in size from 20/135.2 to 20/59 

Nellen, 
„The Ortho-Rater plates were made from the 
vall charts by a double reduction photographic 
Process. Tt was intended to reduce the wall 
OSs (constructed for testing at 20 feet) to 
2295 of the original size. Actually the reduc- 


EAT authors wish to acknowledge their indebted- 
Arm 9 Mr. Owen Conger, Typo and Design Unit, 
arep lication Service Branch, TAGO, for his 
2 Thi drafting of these vision targets. 
and re reduction ratio is employed by the Bausch 
ate omb Company in the manufacture of Ortho- 
g 40 plates. It is based on an estimated distance 
from pitas from lens surface to the eye, and 362 mm. 
he far plate to the eye. 


“Tlluminometer was 


Modified Landolt Ring 


Visual acuity items. 


tion ratios, as determined by an optical com- 
parator with microscopic attachment, are: letter 
plate, right eye .0546, left eye .0545; Landolt 
plate, right eye .0552, left eye .0549. The visual 
angles corresponding to the reduction ratios of 
the Ortho-Rater letter targets are slightly smaller 
than those of the counterpart wall chart; the 
visual angles of the Ortho-Rater Landolt targets 
are almost identical to their wall chart. 

The laboratory in which testing took place 
was constructed in conformity to specifications 
formulated by the Armed Forces—NRC Vision 
Committee. The viewing distance was 20 feet 
for wall chart testing. Illumination was fur- 
nished by three overhead lights in flashed opal 
glass fixtures. These fixtures were evenly spaced 
along the testing alley. The front of the alley, 
sides, top, and floor were covered by white osna- 
burg cloth which served to provide an evenly lit 
surround over the visual field. 

The brightness of the wall charts and Ortho- 
Rater plates was 13.5 millilamberts. A MacBeth 
employed in making light 
measurements. In calibrating the brightness of 
the Ortho-Rater plates, observations were made 
against a blank -plate with the eyepiece of the 
instrument removed. A correction was added to 
adjust for loss of light to be expected in trans- 
mission through the eyepiece. The required 
Ortho-Rater and wall chart brightnesses were 
secured before each session, by use of a volt- 
meter and a continuously variable resistance 
(variac). 

Before being tested, each subject was shown 
sample targets of the designs to be used. The 
testing procedure was carefully explained. It 
was emphasized that he was to keep reading each 
test until told to stop. The subject was encour- 
aged to guess if he was not sure. 

The examiner observed the subject at all times 
to make sure that he did not squint or view the 
charts obliquely. The subject was rested from 
time to time. Responses were transmitted elec- 
trically to an adjacent room where they were 


56 D. A. Gordon, J. Zeidner, H. J. Zagorski, and J. E. Uhlaner 


checked by a technician and recorded on pre- 
pared answer forms. 

The following presentation order of tests was 
maintained: wall chart letter, wall chart modi- 
fied Landolt, Ortho-Rater letter, Ortho-Rater 
modified Landolt. These tests were a portion of 
a larger group of 17 mesopic and photopic targets 
given in the same session. Subjects had ob- 
served five mesopic wall charts and two mesopic 
Ortho-Rater plates before taking the four tests 
discussed here. The letter wall chart was the 
third test given on the photopic level, the modi- 
fied Landolt wall chart was the fifth, the Ortho- 
Rater letter plate was the eighth, and the Ortho- 
Rater modified Landolt plate was the ninth of 
the ten tests given at the photopic level. The 
same procedure was followed in the retest ses- 
sion two weeks later. 


Results 


An indication of the relative difficulty of 
wall chart and Ortho-Rater presentation is 
shown in Table 1. The mean represents the 
average number of items achieved by the sub- 
jects before the criterion of failure was met. 
These results are presented for four scoring 
methods: 

(a) Number of rights before two consecu- 
tive miscallings were first made; (b) Number 
of items attempted before two consecutive 
miscallings were first made; (c) Number of 


rights before three consecutive miscallings 
were first made; and (d) Number of items 
attempted before three consecutive miscall- 
ings were first made. These were utilized to 
show the effect of scoring method on results 
and, thus, give the results wider generality- 
It will be recognized that these scoring meth- 
ods are non-independent measures. 

It may be seen that subjects were able to 
read further on the wall chart letter tests 
than on the Ortho-Rater letter plates before 
meeting the criterion of failure. This differ- 
ence in difficulty may perhaps be explained 
by the somewhat larger visual angle of the 
letter wall charts (see Method and Pro- 
cedure). The scores on the Landolt tests; 
where more perfect reproduction of visual 
angle was achieved, are about equal for the 
two methods of presentation. The standard 
deviations are approximately the same, €x- 
cept that the Ortho-Rater Landolt retest 
shows greater variability than its wall chart. 
Although there are several significant differ- 
ences in means and standard deviations 0 
the two methods of presentation, the differ- 
ences are too small to be of practical impor 
tance. In Snellen acuity units, negligible 
changes in scores are implied. As shown 1? 


Table 1 
Comparison of Means and Standard Deviations for Wall Chart and Ortho-Rater Tests (N = 117) 


Mean Standard Deviation 
Scoring Wall Ortho- t* W: 3 t 
Target Method Chart Rater Ratio Chart ee Ratio 
Letter A 62.6 62.0 1.00 
7 i 11.0 11.1 0.15 
(Test) B 648 64.2 1.14 11.0 11.5 0.82 
c 65.0 63.3 3.07 10.4 11.1 1.46 
D 68.9 66.3 4.53 10.8 11.8 1.74 
Letter A 63.3 614 3.61 
i : i 11.5 11.6 0.32 
(Retest) B 65.6 63.7 3.26 12.2 12.0 0.40 
c 65.7 63.6 3.94 11.0 10.8 0.32 
D 69.6 67.6 3.84 11.5 11.2 0.63 
Landolt A 60.9 61.0 0.19 11.7 
j F : 12.5 1,10 
(Test) B 62.3 63.0 0.90 120 128 1.07 
c 63.3 63.5 0.26 113 12.2 1.35 
D 66.8 67.5 0.94 12.0 129 1.31 
Landolt A 62.0 62.7 1.12 1 6 
g : 1.3 13.2 2.9 
(Retest) B 63.8 64.7 144 11.7 13.2 2.31 
€ 64.3 65.1 1.15 11.3 13.7 3.64 
D 67.8 69.1 it 123 14.6 3.32 


* A t ratio of 1.96 indicates that the difference obtai is signi j 
\ of 1. a e obtained is significant at the 5 f confiden® 
A t ratio of 2.58 indicates that the difference obtained is significant at the 1 per cent level ee 


Visual Acuity Measurements 57 


59 WALL 


— — — ORTHO-RATER 


NUMBER OF CASES 


18-26 27-35 36-44 45-53 54-62 63-71 72-80 
A SCORE. INTERVAL 

IG, ase ate 
See, Distribution of scores on the new Army 
tests. Items 30-39 of the tests are 20/20 


Acuity value (N= 117). 


a 2 and 3, the test distributions are 
“nt be milan In general, the evidence does 
Palio ng that Ortho-Rater and wall chart 
Vatiability. differ greatly in difficulty and 
ang S test netest reliabilities of wall chart 

rtho-Rater scores are shown in Table 
oa. Ortho-Rater reliabilities, with one ex- 
eee significantly higher than those of 

Ege charts. 

tine higher reliabilities of th 
the pa be explained by 
after hei gee plates were 
iS asso e wall charts. If increased ' 

ciated with later tests adminis 


e Ortho-Rater 
the fact that 
administered 

d reliability 

tered in 


60 
WALL - 
50 — — — ORTHO-RATER 
a 
u 40 
á 
5 
5 30 
z 
fe 
2 
3 20 
10 
(o) bisai 
21-29 30-38 39-47 48-56 57-65 66-74 75-83 84-92 
SCORE INTERVAL 
Fic, 3. Distribution of scores on the modified 


Landolt ring tests. Items 38-45 of the tests are 
20/20 acuity value (N = 117). 


the session, the Armed Forces Far Visual 
Acuity test, administered twice, should have 
shown this effect. This test was administered 
as the second and tenth (last) test of the 
photopic series. The test administered in the 
last position shows a significant decrease in 
reliability for two scoring methods and a non- 
significant increase for two other methods. 
The correlations between scores on the wall 
charts and on the Ortho-Rater plates are pre- 
sented in Table 3. These correlations are 
based on scoring method C, which was the 
most reliable method employed (see Table Dy. 
They are about as high as the test-retest re- 
liabilities. The mean of the correlations is 
equal to .83; the mean of the reliabilities of 
scoring method C is equal to .85. The mean 
of the correlations, corrected for the attenuat- 
ing effects of unreliability in each variable, is 


Table 2 
Wall Chart and Ortho-Rater Test-Retest Reliabilities (N = 117) 
E „andolt Test Far Visual 
= Letter Test Lando! Bite f 
Sorin Se ; 
M 2 + Ortho- t : i 
Ethe Wall Pare patio Wall Rater Ratio Second Tenth Ratio 
s St 90 3.30 73 Sl 1.94 90 8 2.72 
4 78 89 3.69 -69 .19 2.14 .88 S81 2.43 
7 88 92 2.04 75 85 2.82 81 86 1.55 
-80 .87 2.23 .65 79 2.91 -80 81 0.29 
‘i 7 
A ¢ this context a i 58 indicates that the diference is significant at the 1 per cent level of confidence. 
t t ratio of 1.96 ext a t ratio of er fference is significant at the 5 per cent level of confidence. In computing the 
ios, the correlation between =-transformations of the reliabilities was estimated at 40 for all comparisons by 
i Sa hy McNemar (4, P- 125). A standard error of .103 was obtained from the 


ap , 
Proximation formula described by 


formule (on, — g -q 
a NN = 3 


p= ze) used in obtaining the t rati 


ios. 


58 D. A. Gordon, J. Zeidner, H. J. Zagorski, and J. E. Uhlaner 


Table 3 
Correlations between Wall Chart and Ortho-Rater Tests (Scoring Method C) ! (N = 117) 
O-R Test Wall Test 
Session Session 
and Wall and O-R 
Test Retest Retest Retest 
Variables Session Session Session Session 
Wall chart ys. Ortho-Rater 85 87 86 89 
(Letter) (.94) (.97) (.96) (.99) 
Wall chart vs. Ortho-Rater 78 84 yd 80 
(Landolt) (.98) (1.00) (.96) (1.00) 


1 Correlations corrected for attenuation are given in parentheses. 


-98. These data offer little evidence to sup- 
port the existence of a machine factor or “ap- 
paratus accommodation factor” specific to 
Ortho-Rater presentation. 


Discussion 


The finding that Ortho-Rater tests are 
more reliable than wall chart tests presents a 
problem for interpretation. The superiority 
of instrument presentation may be due to 
‘lowered visual distraction with limitation of 
the surround, or to some other advantage of 
subject or stimulus characteristic leading to 
greater constancy of conditions. It is be- 
lieved that the difference in reliability be- 
tween Ortho-Rater and wall chart presenta- 
tion will be even greater in operational test-« 
ing than that found here. It is well known 
that the conditions of wall chart testing differ 
widely from place to place. 

If visual angle, background luminance, and 
contrast between object and background are 
equated as closely as possible between Ortho- 
Rater and wall chart presentations, closely 
equivalent measures are obtained. The diff- 
culties of the tests and the variability of 
scores are similar. When the correlations of 
the tests are taken into consideration, the 
methods appear to measure the same visual 
abilities. 


Summary 


This study presents a comparison of visual 
acuity scores obtained on Ortho-Rater plates 
with visual acuity scores on duplicate wall 
chart tests. A total of 117 subjects were 
tested binocularly and retested two weeks 
later. Letter and modified Landolt ring tar- 


gets were employed. Previous practice had 
been given on other mesopic and photopic 
wall chart and Ortho-Rater plates before the 
tests under consideration were given. 

The following results were obtained: 

1. The two methods of presentation were 
of equal difficulty, except for slight discrep- 
ancies introduced by photographic reduction. 

2. The reliabilities of the Ortho-Rater tests 
were significantly higher than those of the 
wall chart tests. 

3. The correlations between Ortho-Rater 
and wall chart tests were about as high as 
the reliabilities of the tests themselves. When 
corrected for attenuation, these correlations 
approach unity. No evidence is afforded, un~ 
der these conditions, of a machine or “appa 
ratus accommodation” factor affecting Ortho- 
Rater acuity scores. 


Received April 11, 1953. 


References 


1. Adams, J. K., Beier, D, C., and Imus, H. A. A 
test-retest reliability study of the Bausch a” 
Lomb Ortho-Rater with naval personne” 
OSRD Report No. 3969, Aug. 1, 1944, AP 
plied Psychology Panel, NDRC, 1-32. 

2. Adjutant General’s Office. Studies in visual ac 
ity. Wash, D. C.: U. S. Gov't, Printing 
Office, 1948, f 

. Altman, A. and Rowland, W, M. Measures ° 
acuity with optical simulation of distanc® 
Quart. Rev. Ophthal., 1952, Vol, 8, No.1. |, 

- McNemar, Q. Psychological statistics. New York: 
Wiley, 1949, 

5. Sloan, L, L. Measurement of visual 
Arch. Ophthal., 1951, 45, 704-725. R- 

. Sulzman, J. H., Cook, E. B., and Bartlett, N- ed 
The reliability of visual acuity scores yita 
by three commercial devices. J. appl. He 

chol., 1947, 31, 236-240, 


o 


A 


acuity 


D 


ee 
EO 
———ee 

_ 


T: <. 
HE JOURNAL or APPLIED PSYCHOLOGY 


Vol. 38, No. 1, 1954 


Effect of Illumination on Scores with Instrument Acuity Tests 


Newell C. Kephart and Stanley Deutsch 


Occupational Research Center, Purdue University 


Po oere tests for the measurement of 
aadar acuity have often been administered 
ant nonong which were not held con- 
The k oth within tests and between tests. 
a aroia gn procedure has been vio- 
conditie many different ways; varying light 
ficial lons, the uncontrolled mixture of arti- 
like eee sunlight, angles of view, and the 
ia elf-contained instruments such as the 
= oni and Lomb Ortho-Rater help to over- 
which hese undesirable deviations in practice 
serve to reduce the validity of the meas- 
urements (1, 2). 
ott, it is realized that despite precau- 
S, occasional variations in the source of 
co in the industrial establishment may 
ad to greater or lesser illumination even 
in the Ortho-Rater than that considered 
a € standard for testing visual skills. This 
$ poeng was designed to determine whether 
are pi such deviations of lighting present seri- 
tires tawbacks to obtaining accurate meas- 
of visual acuity. 


Procedure 


ak total of 55 college’ students in a course 
git psychology were tested on a stand- 
ad Ortho-Rater, using standard practice for 
a ministration (3). The illumination m the 
et was varied by means of a Variac 
po The only deviations from standard 
il ing procedure were these changes in the 
umination levels which were artificially in- 
Pe external to the instrument to simulate 
Onditions which might occur in an industrial 
Situation, 3 
ele was measured at both the 
ioe ce and a total of five levels of 
Orth, was used at each distance. 
oe had just been check 
ive, teplaced where necessary; 
of illumination was used as a bas 
dea Using an illumination level meter to 
am rmine amount of illumination, three lesser 
Ounts of illumination and one greater quan- 


near and 
illumina- 
Since the 
ed and the 
the normal 
e of 


tity were obtained by means of the rheostat. 
This procedure provided illumination for the 
targets in the following percentages of stand- 
ard for far acuity: 12, 56, 75, 100, and 125. 
For near distance the percentages were: 10, 
46, 75, 100, and 125 of the normal illumina- 
tion. 

The stimuli used were the “both eyes” and 
the “right eye” targets. The left eye was oc- 
cluded at all times by a mechanical device 
built into the machine. The targets and levels 
of illumination were presented in a random 
order throughout the experiment, and were 
changed for each subject. Although only the 
right eye saw the material the “both eyes” 
targets were used in addition to the “right 
eye” targets in order to provide additional 


data. 
Results 


The mean acuity scores obtained by the 55 
subjects are shown in Table 1. For the target 
at the optical distance of 26 feet, statistically 
significant differences at the 1% level of con- 
fidence were obtained only for the targets em- 
ploying 10 and 12% of the standard illumina- 
tion. For all of the presentations where the 
illumination was 46% or greater, no statis- 


Table 1 


“P? Ratios Between Acuity Test Scores at Various 
Levels of Illumination and a Standard 
Level of Illumination 


Target 
Level of Both Eyes Right Eye 
Illumi- 
fasion Far Near Level Far Near Level 
10% 11.21 1% 11.27 1% 
12% 5.91 1% 3.37 1% 
46% 7.38 1% 3.89 1% 
56% 1.00 NS. 65 N.S. 
15% 1.35 N.S. 1.18 N.S, 
15% 54 N.S. 61 NS. 
125% 74 NS. 1.56 N.S. 
125% 1.56 NS. 1.29 NS. 


59 


60 Newell C. Kephart and Stanley Deutsch 


tically significant differences were found for 
this group. 

The decrease in illumination for the near 
targets (distance of 13 inches) was somewhat 
more critical. When the lighting was reduced 
as low as 46% or less of standard, differences 
were obtained at the 1% level of confidence. 
No significant differences were found for those 
targets receiving 75% or more illumination. 

When 125% of the prescribed lighting was 
employed, the differences from normal illumi- 
nation were not significant. 


Discussion 


The study was undertaken to determine 
whether significantly different results would 
be obtained if the levels of illumination were 
increased or decreased from that established 
by the manufacturers of the Ortho-Rater. 
Several factors which might potentially in- 
validate the results of instrument acuity tests 
can be hypothesized. ‘These might consist of 
temporary fluctuations in the power supply in 
a factory test room, reduced efficiency of the 
Ortho-Rater light sources prior to examina- 
tion, or any feature which might change the 
illumination level within the instrument. 

Effects of such deviations, however, appear 
to be minimal. A decrease of more than one- 
fourth is required before the differences in 
results become meaningful. 

It is of interest to note that near acuity 
suffers more readily than far acuity. Whether 
this indicates that near visual tasks require 
more illumination than far visual tasks can- 


not be answered from the information pre- 
sented here. 


The 25% increment in illumination used to 
augment the standard at no time produced a 
significant change in visual acuity scores. 

It is a standard procedure for Ortho-Rater 
operators to check the operational condition 
of their instrument prior to use. As a con- 
sequence, chances of using a bulb operating 
at only 50% of efficiency are slight. Devia- 
tions in the power supply of this magnitude 
are instantly obvious, and testing should 
await the return of the more nearly normal 
level of illumination. This study demon- 
strates that minor changes in illumination do 
not appear to have any real effect upon the 
test results. 


Summary 


l. Decreases in illumination as great as 
one-fourth of standard did not affect scores on 
visual acuity with the Ortho-Rater. 

2. Increases in illumination as great as one- 
fourth of standard did not affect these acuity 
scores, 

3. Near acuity scores suffer to a greater 
degree than far acuity scores when illumina- 
tion is decreased more than 25 per cent. 


Received March 18, 1953. 


References 


1. Feinberg, R., and Wirt, S. E. Visual acuity i” 
relation to illumination in the Ortho-Rate?- 
J. appl. Psychol., 1947, 31, 406-412. 

2. Jobe, F. W. Instrumentation for the Bausch and 
Lomb Industrial Vision Service, Bausch an 
Lomb Magazine, 1944, 20, 6-7, 14-15. 4 

3. Standard Practice in the Administration of the 
Bausch and Lomb Occupational Vision Test 


with Ortho-Rater. Bausch and Lomb Optical 
Co., 1944, 


S 
. o —————————————————— p- 


THE JOURNAL OF Al yi 3 
Vol. 38, No. 1. eee PSYCHOLOGY 


Applied Psychology in Action 


Psychological Research in Personnel Administration 


Joseph G. Colmen 


Civilian Personnel Research Branch, Headquarters, USAF 


Industrial management has undoubtedly 
been skeptical about the value of the person- 
nel psychologist as a direct part of its opera- 
tions payrolled as its job evaluation, training, 
organization and methods and other func- 
tions are, Yet a large number and variety of 
management problems can be attacked by the 
application of the specialized skills of the re- 
search psychologist. And in most cases not 
only can they provide the most valid solu- 
tions and recommendations but can do this in 
a manner which will please even the most 
practical administrator. To do this, it seems 
important for the research psychologist to be 
close enough to the management and opera- 
tions of the organization so that he can sense 
needs for research in day-to-day problems. 
And he can make acceptable recommenda- 
tions for application of research results in the 
same setting. The possibilities for success 
are greater, of course, where the relationship 
between administrator and psychologist is a 
close and continuing one. 

The Civilian Personnel Research Branch 
(CPRB) of the U. S. Air Force Headquarters 


is in the fortunate position of approximating 


this ideal. This Branch conducts psychologi- 
cal research originating from everyday prob- 
lems of the civilian personnel program of the 
Air Force. 

Many problems had pointed to the need 
for test development when the CPRB was es- 
tablished in 1950. One of the urgent prob- 
lems was to find more objective means for de- 
termining supervisory potential to improve 
the level of supervisory proficiency within the 
nir Force, especially in the important area of 
R relations. Another problem was to 
ind an objective measure of potentiality for 
administrative work as a basis for selecting 
Junior employees for specialized development 
and training. 

é These problems demanded pioneering Ye 
Search in areas where success had been previ- 


g re- 


ól 


ously only hopefully promising. Fortunately, 
management recognized that scientific re- 
search took time. It did not insist on “quick” 
results at the sacrifice of adequate research. 

As work on these basic problems progressed, 
other needs became evident. It was noted 
that from time to time Air Force installations 
were conducting attitude surveys among ci- 
vilians without the benefit of instruction in 
accepted principles of public opinion polling. 
Guidance was needed in formulating ques- 
tions, sampling employees, conducting sur- 
veys, and analyzing and interpreting results. 
A compact, highly functional guide to the 
conduct of civilian employee attitude surveys 
was prepared to correct these deficiencies to- 
gether with a questionnaire with general ap- 
plicability at all Air Force bases. 

Because test administration would be un- 
dertaken by all Air Force bases upon comple- 
tion of the research, a guide on establishing 
test administration facilities and on the best 
methods of administering and scoring tests 
was developed. Also, regulatory material 
was published to assure effective coordina- 
tion and highest quality of research through- 
out the civilian personnel program and to 
stimulate necessary personnel research where 
resources permitted. 

To conserve research resources, use Was 
made wherever possible of the work of other 
organizations with adaptation and special 
check and validation being accomplished 
within the Air Force. On the other hand, no 
test is authorized for use without specific 
validation on the group for which it is in- 
tended or for a purpose other than that for 
which it was validated. 

With what has been a very heavy schedule, 
new developments in technical areas have still 
been possible. Readers of this paper trained 
in test theory will recognize that depth, 
quality, and originality of research have not 
been sacrificed even in the applied setting in 


62 Applied Psychology in Action 


which it is conducted. A nomograph for de- 
termining significance of difference between 
percentages for finite populations was devel- 
oped for handy use by statistically untrained 
people in analyzing attitude survey data. An 
“unconventional” key and rationale for it 
was hypothesized and verified as a valid key- 
ing technique for personality test items. The 
methods of weighting tests by precise Wherry- 
Doolittle beta weight methods or Wherry- 
Gaylord integral weighting were found to be 
less appropriate under certain circumstances 
than mere unit weighting of tests so selected. 
Later discussions with Dr. Wherry have con- 
firmed that the number of cases and size of 
test intercorrelations do affect the stability 
of weights derived by these methods. 
Awareness of the specialized skills brought 
to the organization by research psychologists 
soon led personnel specialists in other pro- 
gram areas to seek the services of the Ci- 
vilian Personnel Research Branch, Typical 
assignments in other areas of applied research 
were: to determine the advantages and limita- 
tions of functional music in the work setting; 
to develop sampling methods in connection 
with interviews used as part of an evaluation 
of the effectiveness of civilian personnel pro- 
grams at Air Force bases; to determine the 
best sources and methods of ascertaining su- 
Pervisory training needs as a basis for de- 
veloping training course content; to deter- 
mine the extent and causes of turnover among 
civilian personnel office professional staffs; to 
develop a battery for selection of fiscal ac- 
counting clerks; to evaluate effectiveness of 
employee suggestion systems; to make recom- 
mendations concerning the most valid and re- 
liable information about personal character- 
istics of candidates for positions from inter- 


views, vouchers and other screening methods; 
with the Civil Service Commission and four 
other federal agencies, to develop selection 
methods for reducing the number of emo- 
tional misfits finding their way into overseas 
jobs; and others. A 

Though research has constituted the major 
responsibility of the CPRB, it has never di- 
vorced itself from the personnel administra- 
tion of which it is a part, so that research 
needs are perceived by the psychologist in 
the operating and staff civilian personnel 
problems with which he is in close contact: 
Nor is the work completed when research 
findings are reported. Instead, implementa- 
tion of those findings in the practical setting 
of the operating civilian personnel office be- 
comes in large part also the responsibility of 
the researcher. And he is kept informed of 
and asked to comment on personnel adminis- 
tration programs, policies and procedures 
which are under consideration in the Direc 
torate of Civilian Personnel. 

The satisfaction of management with the 
accomplishments of its personnel research 
function is seen in continued support an 
growing acceptance. By keynoting economy 
and improvement of operations, data havé 
been accumulated showing how the results 
of the work of CPRB have much more tha? 
offset its modest cost. It is interesting t° 
note that whether or not a personnel research 
activity is maintained, management will co 
duct research studies. Staffing with perso?” 
nel specifically trained for such work pays 
dividends, if in no other way than in mak 
ing such research sufficiently sound to assuté 
management that the conclusions may be ap 
plied with confidence. 


Time Limit versus Work Limit Methods of Test Administration 


E. B. Knauft 
Aetna Life Affiliated Companies, Hartford, Connecticut 


The majority of mental alertness tests used 
in the employment situation are speed tests 
in which the time factor plays an important 
role. Those of us working in business and 


industry are sometimes asked whether such 
speed tests unduly penalize the “slow and a4 
curate individual” who might be a very satis 
factory worker. 


Applied Psychology in Action 63 


Some information relative to this question 
was obtained in the process of revising the 
LOMA-1 Test. This is a 15-minute general 
mental ability test of 236 items (including 
number series, same-opposites, analogies, and 
general information) developed by the Life 
Office Management Association for use in 
meniber companies. Many studies during the 
Past 15 years have demonstrated the validity 
of this test as an aid to selection of clerical 
employees in insurance companies. A revised 
form of this test was recently administered to 
employees who took the test with the usual 
15-minute time limit and then were permitted 
to complete the remaining items with time 
noted, but no time limit. 

Data based on 235 employees in four dif- 


ferent life insurance companies showed that 
scores obtained in the 15-minute time limit 
correlate + .88 with scores on the entire test 
obtained under untimed conditions. The 
mean time required to complete all items was 
30.1 minutes. 

For this sample, it appears that individu- 
als performing relatively poorly on a mental 
alertness test under timed conditions will not 
appreciably change their standing in the 
group when permitted to complete the test 
with no time limit. The 235 employees rep- 
resent a sample of persons hired within the 
past five years and still employed by the four 
companies. The great majority are between 
the ages of 18 and 35 and are high school 


graduates. 


Employee Opinion Surveys 


“Why should we invite employees to criti- 
cize us? They do enough of that anyway 
without being asked.” : 

That’s the attitude of many top manage- 
Ment officials. . . .2 

But San Diego Gas and Electric Co. is one 
Company that calculated the risk. . - - Now 
it says that it’s glad. One reason SDG&E is 

appy with the results is that the employees 
Bave the company a pretty good rating. 
1 See McMurry, R. N. Management's reactions to 


employee opinion polls. J. appl. Psychol, 1946, 30, 
polls. J. appl. ts yı 3 
212-219, (Reference added by Editor.) 


But they received some very specific sug- 
gestions. . . - A total of 3,380 unfavorable 
comments and 1,290 suggestions were made 
by 2,178 employees. . - - The company had 
determined in advance to do something about 
reasonable complaints and it followed up 
quickly. Top management gave full ap- 
proval and support. Adequate assurance of 
anonymity was provided by the Industrial 
Relations Section of California Institute of 
Technology which conducted the survey get- 
ting a 99 per cent return. (Condensed from 
Business Week, November 7, 1953, p. 167.) 


Book Reviews 


Lawshe, C. H. Psychology of industrial rela- 
tions. New York: McGraw-Hill Book Co., 
1953. Pp. vii + 350. $5.50. 

During the past decade psychology has had 
a rapidly increasing impact upon industrial 
management through two developments: (1) 
the introduction of psychologists into staff, 
and occasionally line, positions; (2) the train- 
ing of various levels of management in the 
principles of human behavior. This has led 
to two types of publication: those for use in 
training industrial psychologists and those 
which present the findings of psychology to 
the non-psychologist. This volume is an ex- 
ample of the latter. 

Seven authors are involved—two from Pur- 
due (Lawshe & E. J. McCormick), one each 
from the Army (A. J. Drucker), Air Force 
(W. F. Long), and Navy (E. E. Dudek), 
and two from industry (K. Oliver & R. I. 
Dawson). The fifteen chapters deal with the 
usual topics: principles of human behavior, 
motivation, attitudes, placement, training, su- 
pervision, employee complaints, counseling, 
efficiency, wage administration, employee and 
employee-management relations. Although 
the chapters were written by individual au- 
thors, they are remarkably similar in style, 
level of reading difficulty, and point of view 
and there appears to be relatively little over- 
lap of content except when desirable. In 
general, the writing is clear and direct, with 
attempts to define psychological terms by 
means of industrial examples. Each chapter 
concludes with a set of references to journal 
articles or psychological texts. 

So much for the over-all structure of the 
book. The. question remains—how well has 
the goal of communicating industrial psychol- 
ogy to the non-psychologist been achieved? 
The only true answer to the question can come 
from actually measuring what effect the study 
of this book by non-psychologists has had 
upon their knowledge, skills, and attitudes in 
human relations. Lacking research findings, 
the reviewer can merely speculate as to the 
book’s probable value. 

The problem in writing for the non-psy- 
chologist is, of course, deciding what one 
wants to communicate. There are several 


possibilities: (1) psychological findings with 


64 


or without the underlying evidence; (2) sug- 
gestions for what to do in industrial settings, 
with or without reference to the principles 
being applied; (3) a point-of-view, derived 
from psychological principles of human be- 
havior. 

In this reviewer's opinion, the authors have 
done a creditable job in presenting a large 
body of facts and principles, backed up with 
sufficient references to research literature. 
However, certain areas to which psychologists 
have devoted considerable thinking and Te- 
search are inexplicably omitted or merely 
mentioned in passing, viz., industrial safety, 
democracy in management, executive develop- 
ment, employee rating methods, characte- 
istics of the learning curve, transfer ° 
training. ait 

It is difficult to evaluate the “how-to-do-it 
aspect of this book. There appears to be a? 
attempt to present principles, not specific 
applications; the discussions tend to include 
more of the “it’s important to take the = 
lowing things into account” type of stateme? 
than to describe how to take them into aC 
count. There is a somewhat too frequent 4° 
pendence upon a brief raising of a questio” 
or listing of factors and then a reference to a 
bibliographic item for the details, assumin8 
that the reader will go to the sources. 

Possibly psychology really cannot give very 
many specific suggestions for industrial prac 
tices and that its real contribution is ” 
methodology and point of view. If so, this 
book serves a useful purpose in getting across 
to the non-psychologist the basic attitude 
towards human problems which characteriZ® 
psychology, e.g., “emphasis upon the peop 
that work rather than upon the product they 
make,” the importance of satisfying basic hu 
man needs, the need for a “basic respect 10 
human beings and a genuineness of purpose s 
dealing with employees.” To the extent tha 
publications of this type stimulate operativ? 
personnel to examine their fundamental at! A 
tudes toward human behavior they will facili 
tate the acceptance of programs developed 
the industrial psychologist and raise the le¥ 
of daily interpersonnel relations. 

A. S. Thompso” 


Teachers College, 
Columbia University 


Book Reviews 65 


McFarland, Ross A. Human factors in air 
transportation. New York: McGraw-Hill, 
1953. Pp. xv + 830. $13.00. 

Meco pie of the airplane and the exten- 

a of its performance characteristics have 

in jected those occupying this device to 

P variety of environmental 

: Ses: Extremes and variations of tem- 

Perature and accelerative forces are common- 

Piare in military flight. The tasks of main- 

ie and operation have required the de- 

opment of skills unthought of fifty years 
ae In many ways, the airplane has pro- 
ided a laboratory and a never-ending set of 

Problems for the engineer, the physiologist, 

and the psychologist. 

Drawing upon his unique and extensive ac- 
toca with practically every aspect of 
sh ee and military aviation, Dr. McFar- 

has written an encyclopedic volume of 
over 800 two-column pages, illustrated with 

Well-selected tabular, graphic, and pictorial 

displays, Each chapter is followed by a 

Selected bibliography. 

a Human Factors covers thoroughly the areas 

pst maintenance of proficiency, and 

rae which are implied by its title as well as 
lin topics as sanitation and health in air- 

tee Operations, the care of passengers, and a 

3 Scription of medical programs. The dis- 

ee of physical factors involving circula- 

ory and sensory phenomena is unusually com- 

Prehensive. 

oo reviewer is impressed by the clear style, 

of nt organization, and excellent typography 

met e book. This work would seem to be not 
has a landmark in the area for which it 1s 
fo ended but also a valuable source of in- 

mation for those concerned with most 
oe of personnel psychology. Its chief 
ic Wbacks are likely to be its size and perhaps 
rh of $13.00, although the latter 1 cer- 

hate, modest for so large a book aimed at a 

mited audience. 

The P. ; George K. Bennett 

Nie cee Corporation, 
ork, New York 

Husband, Richard W. The psychology of 
Successful selling. New York: Harper an 
Brothers, 1953. Pp. 306. $3.95. 
i his book is directed to all salesmen t0 2 
m in their daily work. Its emphasis 15 


to aid 


on sales tactics, from finding your prospects 
through.approaching him and overcoming his 
resistance to closing the sale. There is also 
a short section concerning the selection of 
salesmen, helping him to compare his traits 
with those of successful salesmen. 

This book is not intended to be a pro- 
fessional book for psychologists; rather it 
is deliberately designed to be easy, informal 
reading without technical language or ref- 
erence to experiments or statistics. It is ad- 
mittedly based upon reading leading books 
by professors of business and sales personnel, 
sales journals, trade publications, newspapers, 
popular magazines, training manuals of cer- 
tain companies, and the author’s personal 
experience. Drawing upon these sources, the 
book presents a series of rules, principles, 
steps, and laws on how to be effective in each 
phase of selling. These are liberally illus- 
trated with clever examples and entertaining 
anecdotes. Sprinkled with this is advice and 
moralizing based on the personal opinion of 
the author. 

Thus while the author claims this is the 
first general book on salesmanship written by 
a professional psychologist, it is certainly free 
from the concepts and language of the psy- 
chologist. You will find no discussion per se 
of motivation, adjustment, individual differ- 
ences, learning, perception, and so forth. This 
was apparently deliberately omitted in order 
to make the book more appealing and read- 
able for salesmen. There are many para- 
graphs, however, in which the oversimplifica- 
tion has led to statements which the reviewer 
could not accept. The book is a challenge to 
psychologists in that it reveals ‘a large area 
in which practical applied research can still 
Many statements of the book are 

“should be” and reasoning 
it would be quite difficult to 
h evidence from scientific 


pioneer. 
based on what 
from analogy; 
support them wit 


references. 
In general, there is little in the book to 


recommend it even to sales managers or sales- 
men over the many other volumes written in 


this field. 
Brent Baxter 


The Prudential Ins. Co. of America, 
Newark, New Jersey 


66 . Book Reviews 


Jahoda, Marie, Deutsch, Morton, and Cook, 
Stuart W. Research methods in social rela- 
tions, with especial reference to prejudice; 
Vol. I: Basic processes; Vol. II: Selected 
techniques. New York: The Dryden Press, 
1951. Pp. x +421, x+ 423-759. $6.00 
(set). 

As indicated in the preface of this pub- 
lication, “This book is in many ways the out- 
come of group effort. The idea of producing 
it arose in a group; it is presented under the 
auspices of a group; its production was 
financed by several groups; it had the edi- 
torial guidance of a group; and it was pro- 
duced by a group.” The sponsoring agency 
for the book was The Society for the Psy- 
chological Study of Social Issues. 

The two volumes themselves show the im- 
pacts of their sponsorship, and of the many 
hands which have been laid upon them. 
There is a sense of urgency in the book’s 
treatment of problems of social intolerance 
and discrimination, and an implication of 
mild, but persistent, exhortation to the reader 
to take constructive steps in combating these 
evils. For the scientific reader the proper 
course is to be found in “action research” 
(participative research directed toward the 
solution of tangible problems), and for the 
social practitioner the recommended course 
is cooperation with the scientific investigator. 

In spite of the instances of apparent over- 
earnestness and occasional naiveness which 
occur in the book, it still remains a useful and 
informative document. The sections on re- 
search planning, on practical issues in re- 
search, and the uses and applications of re- 
search ‘results are admirable. The second 
volume, which consists for the most part of 
separate papers by various contributors, 
should also be noted. Readers desiring short, 
but critical and dependable resumés of topics 
such as scaling concepts, the use of panels, 
and sociometric analyses will find this vol- 
ume a valuable reference. 

The avowed purpose of the book was to 
teach two audiences: the conductors of social 
research and the users of social research. In 
the reviewer’s opinion neither of these specific 
goals has been satisfactorily met, but an- 
other, equally legitimate goal has been. The 
book is too superficial and given over to 


standard illustrations to be of much help to 
the practicing researcher, and seems to be too 
academic and technical to have much appeal 
to the practical man-of-affairs. But the book 
does have a thoroughness, and an informa- 
tive and dependable quality which would 
make it an excellent source book for non- 
specialists, and for students who wish to gain 
a brief, but competent and comprehensive 
overview of this field of research. 
Harrison G. Gough 


University of California, 
Berkeley 


Coombs, C. H. A theory of psychological 
scaling. Ann Arbor: University of Michi- 
gan Press, May, 1952. Pp. vi + 94. $1.75. 
If you’ve had a hard day measuring att! 

tudes, don’t expect this small monograph tO 

provide an evening’s relaxation. It’s packe 
from cover to cover with non-superfluous Ma 
terial. It is to the author's credit that he has 
said so much in so short a space; nevertheless: 
persons lacking expertness in scaling theory 
will not digest the contents properly. On the 
other hand, scaling theorists will accept this 
tidbit as a juicy morsel and will soon be 1ook- 
ing for more. 

The theory presented here has been in ge 
process of formulation for four years and rep 
resents the contributions and criticisms ° 
many scholars. It has undergone continuous 
modification in response to these criticisms 
and will undoubtedly undergo more. at 
ever, its publication now “is necessary fo 
the presentation of certain consequences A 
practical interest to psychologists and soc! 
scientists” (page v). de 

Roughly speaking, the presentation is m4 
in four parts: (1) a general discussion of t T 
aspects and problems of psychological me@ E, 
urement; (2) a listing and brief explanatio 
of the definitions and postulates on wĦC 
the theory is based; (3) the development a 
interpretation of genotypic and phenotyP 
parameters; and (4) derivations of the ©” 
sequences of various genotypic conditions ae 
the application of the theory to several 54 
ple experiments, a 

The effort has been to present a mathe 
matical model which will satisfy observe et” 
havior. As such, the theory must resolve © 
tain fundamental issues such as the quest! 


Book Reviews 67 


of defining a psychological trait in a mathe- 
matical sense. In order to resolve problems 
of this sort, parallel systems have been de- 
co paella formulated at the genotypic 
evel and referring to an individual’s inferred, 
puting abilities and behavior—the other 
ogi at the phenotypic level and re- 
b hee to an individual's observed, manifest 
A avior. The ultimate objective of the sys- 
oe then, is to treat information obtained 
i n'a set of phenotypic observations so as 
o allow inferences at the genotypic level. 
Eo Aest five chapters present the theo- 
diat: ramework for the realization of this 
co ive, It is in Chapter VI that Dr. 
th mbs discusses the area of joint scales— 
e final relating of the phenotypic to the 
eeepc. And it is here that he must con- 
ess partial defeat, for he comes face to face 
PA the problem of the direction of the in- 
erences we, as theorists, wish to make. Thus, 
Starting with certain genotypic conditions, it 
S demonstrated what the consequences must 
Fe in terms of manifest behavior. Unfor- 
à ig in the practical situation, it is only 
in ro that we may apply these relationships 
isti e opposite direction. From character- 
oh cs of manifest behavior, we desire to infer 
le alge of behavior at the genotypic 
h el. The author sums up the difficulty, “It 
as not been shown that for this given set of 
S arameters or characteristics of the manifest 
T is necessary that these and only these 
le nditions must characterize the genotypic 
vel (page 52). 
= he author openly states that the theory is 
ot in final form. By implication, it is his 
eae that this publication will initiate inter- 
fo resulting in a wider range of development 
= the theory in both its abstract and real 
x TSN To this end, the monograph repre- 
S a good start. 
i Marvin D. Dunnette 
à he University of Minnesota 


New York Academy of Medicine and the 
Josiah Macy, Jr. Foundation. (Transac- 
thee of the Conference on) Morale—and 

he prevention and control of panic. New 
York: New York Academy of Medicine, 
ee Pp. 75. No price cited. 

ce publication is aimed at inspiring wide- 

ad consideration and study of morale and 


panic. Its audience is not defined but it ap- 
pears to be officials who may have responsi- 
bility for controlling public morale and panic. 

The purpose of the conference was to ex- 
plore available knowledge of the problems 
and discover a way of implementing the 
pooled knowledge through action. Conferees 
were 1 Ph.D., a psychologist in a Veter- 
an’s Hospital; 8 M.D.’s, psychiatrists from 
schools, state and national medical associa- 
tions and governmental agencies; and 2 rep- 
resentatives of public information media, an 
official from a radio broadcasting company 
and the editor of a city newspaper. 

The conferees earnestly advocate study of 
the problems and use of the resulting find- 
ings. Meerloo, formerly chief of the Psycho- 
logical Department of the Netherlands Army, 
contributed most of the specific references to 
evidence on the factors affecting and means 
of controlling panic, drawn from his own 
work during World War II. Herbert Brucker, 
Editor of the Hartford Courant, emphasized 
the belief that factual, play-by-play report- 
ing of the news events as they occur is prob- 
ably the greatest contribution public news 
media can make. He argues against at- 
tempted manipulation of the news and ex- 
hortationary releases from government offi- 
cials as having less than good effect upon 
public morale. 

Much conference time was devoted to re- 
cital of personal experiences and to reference 
to incidents ranging from biblical events to 
postwar reactions of German war leaders. 
Various ways of controlling morale and panic 
with varying degrees of success were cited 
with Meerloo’s practical findings being of 
greatest interest. 

To focus the attention of public officials, 
educators, and research personnel on these 
problems appears desirable. Perhaps a joint 
attack, by experimental study of isolated fac- 
tors and by concurrent multifactor study 
with techniques such as were described dur- 
ing the first meeting of the Operations Re- 
search Society of America, would be produc- 
tive. 

In summary, there was little experimental 
evidence or firm knowledge about causes and 
control of public morale and panic disclosed 
during the conference. It is not clear that 
the aim of inspiring widespread consideration 


68 Book 
and study of morale and panic will be at- 
tained by publication of the transactions of 
the conference. 


Clark L. Hosmer 
United States Air Force 


Traxler, Arthur E., Jacobs, Robert, Selover, 
Margaret, and Townsend, Agatha. Intro- 
duction to testing and the use of test re- 
sults in public schools. New York: Harper 
and Brothers, 1953. Pp. 113. $2.50. 


This book is designed to serve as a “prac- 
tical, down-to-earth handbook for schools be- 
ginning the use of objective tests, for teacher 
discussion groups, for in-service training pro- 
grams, for persons who have had experience 
with tests but who desire to brush up on the 
simpler fundamentals of testing, and for in- 
troductory classes in tests and measurements.” 
It is a revision of Educational Records Bul- 
letin No. 55, Introduction to Testing and the 
Use of Test Results (Educational Records 
Bureau, 1950) which was prepared primarily 
for independent schools. 

A general discussion of the role of objec- 
tive tests is followed by sections on planning 
a testing program, selection, administration, 
and scoring of tests, and analysis, interpreta- 
tion, recording, and use of test results. Ele- 
mentary concepts from test theory and sta- 
tistics are presented in context. Illustrative 
material is utilized extensively; interpreta- 
tions of data for individual students and 
classes, and copies of score reports, cumula- 
tive record forms, and the like make up a 
substantial portion of the book. Through- 
out, the authors give detailed attention to 
the limitations of objective tests and to cau- 
tions which should be exercised in the inter- 
pretation of test results. 

Each chapter includes a list of references 
for readers who wish to go beyond this intro- 
ductory handbook. For such readers, sup- 
plementary information on score interpreta- 
tion is likely to be of special concern; the use 
of test results for descriptive and compara- 
tive purposes is treated more explicitly than 
is their application in predicting future per- 
formance. 

This brief, nontechnical book should be 
distinctly useful to the groups of readers to- 
ward whom it is directed. Despite its title, 


Reviews 


the revision seems equally appropriate for 
public and independent schools. From the 
standpoint of the former, the more detailed 
discussions of test selection and program 
planning included in the revised edition 
should be of particular interest. 

Marjorie Olsen ~ 

Educational Testing Service, 
Princeton, New Jersey 


Powers, Edwin and Witmer, Helen. An G 
periment in the prevention of delinquency: 
The Cambridge-Somerville youth study: 
(With foreword by Gordon W. Allport.) 

` New York: Columbia University Press: 

1951. Pp. xliii + 649. $6.00. 

The book is devoted to the description and 
evaluation of a program, or as the authors 
call it, an experiment in the prevention 0 
juvenile delinquency. The program had its 
origin in an idea formulated by Dr. Richar 
Clarke Cabot. He believed that the delni 
quency of boys could be prevented were 
made possible for them to come under the 
constructive influence of friendly counselors: 

The study was begun in 1935 and term” 
nated in 1945. In design, the study as ini 
tially conceived was in the best scientific 
tradition. One group of boys was to | 
given the benefits of counseling while another 
group of boys matched on several variaba 
was to remain untreated. Members of t Ho 
two groups were selected from lists of name 
provided by various sources. School author! 
ties nominated boys considered as diffen 
and troublesome as well as boys regarded a 
adjusted. Court records were examined A 
names of potential study subjects. Probé 
tion officers, police officers, social agenc!e? 


i 
— Se ee 


etc., were asked to submit names. Appro” 
mately 2,000 names were obtained. All boy’ 
who had passed their twelfth birthday ae 
ing the period between referral and inves 
gation were eliminated. Boys who could af 

be found or who were unavailable were 4”, 
eliminated. The names of those remain! F 
after this screening were submitted to thre 
experts (not members of the project hee 
for rating on an eleven-point delinque™ i l 
probability scale. The rating process, wad 
took fifteen months to complete, prov y 
782 candidates for the experiment. Two P 


Book Reviews rs 


chologists were then asked to match one boy 
With another on such variables as health, 
a ee personality, home, neighborhood 
coih celngueriey prognosis. The toss of a 
sa ayen which of the matched boys 
Six hu Ñ placed in the experimental group. 
divid mdred and fifty boys were selected and 
sed into two groups of 325 each. 
Pro in the experimental group were as- 
such b to: Elis project staff. Staff work with 
Was fi oys was begun in November, 1937, and 
it —— in May, 1939. Two years later 
Shes found that case loads of 35 boys 
and in too great a burden on the counselors 
bos rhe ie to facilitate more effective work, 
Were in ieved to be in no need of service 
eplug he se This “retirement” (65 boys) 
ot m Fs elimination of cases through death 
hiv obility out of the project area (113 
Ys) and the manpower shortage produced 
: the war which necessitated discharging 
E RS care as they reached their seven- 
fecii birthday (72 boys), resulted in sub- 
erias, the members of the group to varying 
stud 3 ot treatment. This in brief is the 
Y organized to test Dr. Cabot’s hypothesis. 
a book contains two parts. Powers, the 
and th of the first part, describes the project 
ase © subjects chosen for treatment. Many 
an a delinquency and its prevention m 
n thi an setting are adequately discussed. 
is part of the book the reader will also 
Proble comprehensive treatment of the many 
e ms which arose as the project evolved. 
Coa A part, written by Witmer, is con- 
E ës with the evaluation of the results of 
Xperiment. 
tenga indicating the results of the experi- 
Person, seems desirable to describe briefly the 
ideg na selected to implement Dr. Cabot’s 
and in he search for staff was begun in 1935 
or t all over 250 persons were considered 
Search 10 counseling positions. In their 
marily the directors of the project were pri- 
ess į interested in persons believed to pos- 
Ntelligence, tact, unimpeachable charac- 
AS professional experience in dealing 
ve people, Those selected were also to 
F est co in the objectives of the project. 
Social ge cation and training in professional 
tthe nork Were not considered a prerequisite 
Candidate was “a warm, outgoing per- 


son who had that vital spark so essential in 
human relationships” (page 92). Women, as 
well as men were considered since it was be- 
lieved that the former would be particularly 
useful in dealing with younger boys. Of the 
ten persons chosen to begin the project four 
were women. A total of 19 different counsel- 
ors were employed in the duration of the ex- 
periment; 15 men and 4 women. Of these 
19, two had had experience as boys’ workers, 
two were psychologists, one was a trained 
nurse, eight were professional social workers 
and six others had completed some of the 
academic requirements for a degree in social 
work. 

The counseling staff of the project at- 
tempted to be of service to the boys in many 
ways. In general each counselor was ex- 
pected to learn to know the boys assigned 
to him as completely as possible so as to aid 
the boy to make a more effective adjustment 
to changing life situations. To do this effec- 
tively the counselor needed to be intimately 
acquainted with each boy’s assets and liabili- 
ties. But even more than this the counselors 
helped boys or members of their families to 
find employment, arranged for camp and 
summer placements, advised and counseled 
the boy’s family in respect to his problems, 
procured professional services to remedy the 
boy’s handicaps, taught and encouraged the 
boy to pursue hobbies and wholesome recrea- 
tional activities, etc. Thus, as it may be 
seen, the counseling staff operated in many 
fields and stood ready to aid both the boy 
and his family to meet a variety of needs. 

The results of the experiment indicate that 
little was accomplished. As a matter of fact, 
if we adhere strictly to the data presented, 
the differences in social adjustment of the 
boys in the experimental and control groups 
are insignificant. The services rendered by 
the project appear to be no more effective in 
achieving adjustment than the ordinary events 
in the lives of the boys. It seems apparent 
that delinquency cannot, on the average, be 
prevented by providing the services and coun- 
seling rendered by the project. All of this 
suggests that delinquency and maladjustment 
must be regarded as associated with a va- 
riety of combinations of psychosocial factors 
and any program intended to prevent such 


70 Book Reviews 


deviations must provide different techniques 
to deal with such different combinations of 
factors. 

This is an important book. Much too little 
has been done to put to a rigorous test the 
explicit or implicit assumptions that underlie 
much of what is done in social engineering. 
The experiment reported does this in a fashion 
that renders it an outstanding example of the 
best in social science research. Dr. Cabot’s 
idea failed to produce hoped for results but 
the experiment designed to test the idea is a 
significant contribution to all of the social 
sciences. 

Dr. Allport’s foreword is an exceptionally 
well written preview of the study. 


: Elio D. Monachesi 
Department of Sociology, 
University of Minnesota 


Personality: Symposia on topical issues, Vol. 
1, Nos. 3 and 4 (pp. 213-388). New 
York: Grune and Stratton, 1951. 

Of these two numbers, the first contains 11 
articles on Hypnosis and Personality and the 
second seven articles on Hypnotherapy. G. 
W. Williams’ introductory article discusses 
some of the unsolved problems of hypnosis; 
here, as in many other places in this sym- 
posium, the controversial nature of nearly all 
topics in this field is emphasized. Guze 
writes on posthypnotic behavior and suggests 
that a standard experimental situation involy- 
ing responses to posthypnotic suggestions 
might be useful as a diagnostic tool, since 
it would show how subjects handle impulses 
not congruent with their usual behavior. 
True and Stephenson present a very impor- 
tant research article on the EEG, pulse, and 
plantar reflex in age regression and induced 
emotional states, in which they confirm the 
recent finding that the Babinski reflex ap- 
pears in subjects who are regressed to in- 
fancy, but they fail to find EEG changes. 
Harriman reports experiments on automatic 
writing, which resulted in very few conclu- 
sions. Loomis contributes a thorough survey 
of experiments from Bramwell to the present 
on space and time distortion in hypnosis. 
LeCron gives the results of an inquiry among 


hypnotists, which shows that they are ein 
erally poor subjects. A short but vivi J) 
written article by Estabrooks discusses pos- 
sible antisocial uses of hypnosis. Weitzer- 
hoffer gives a survey of the major investiga- 
tions of transcendence of normal voln S 
capacities and concludes that such transcen 4 
ence is fairly well established and that sug 
gestions can cause alterations in nearly E 
organismic activities. As frequently happea 
when contributions are invited, a cn 
amount of recently published material E 
warmed up and served again. Christen 
for example, has published also in Psychiat") 
(1949) and in Experimental Hypnosis (edi 
by LeCron) articles on dynamics in hypo 
induction in addition to the one appen a 
here. A prospective rather than a re 
tive look is, however, characteristic of the a 
ticle by Kline on psychodiagnostic tests a 
of 15 articles cited eight are “in press- is 
In the number on Hypnotherapy there } i 
very useful introductory article by Schne o 
which includes brief summaries of some i 
the literature. Watkins’ “Hypnotherapy A 
the military setting” offers little to one von 
has read his book. Rosen’s “Radical byt a | 
therapy of apparent medical and sure 
emergencies” contains four full case rep 
and is largely new material. Kroger PE ir 
very thorough account of personality dyra 
ics and hypnosis in gynecology based u 
Kroger and Freed’s book. The articles 
Raginsky (anesthesiology), Heron (dere 
uses), and Abramson (obstetrical uses) m 
to this reviewer tantalizingly brief and oof 
eral in treatment, and his scanty informa b 
in these fields was not much increase“ ih 
them. Fuller articles on these topics 
more concrete descriptions of the proce 
would have been welcome. ab IP 
In spite of the question of multiple P fo! 
cation in a time when nearly all outlets js 
publication are crowded, this symposia 
very valuable; it presents some new mat ot 
and its summaries and surveys and full Piyu 
ographies make it very useful for thé 
dent, investigator, and practitioner. 
Frank A. 


ju 


pattie 
University of Kentucky 


New Books, Monographs, and Pamphlets 


Book: se 
S, manta and pamphlets for listing and possible review should be+sent to Donald G. Paterson 
ditor, Department of Psychology, University of Minnesota, Minneapolis 14, Minnesota. i 


Maternal dependency and schizophrenia. Jo- 
seph Abrahams and Edith Varon. New 
York: International Universities Press, 
1953. Pp. 240. $4.00. 

The design of social research. Russell L. 
Ackoff. Chicago: The University of Chi- 
cago Press, 1953. Pp. 376. $7.50. 

P, gtd fundamentals for administrators. 
me Argyris. New Haven: Labor and 
Management Center, Yale University, 1953. 
Pp. 123. 

oe and relationships in counseling. Ralph 
1. Berdie, Editor. Minneapolis: Univer- 
Sity of Minnesota Press, 1953. Pp. 37. 
$1.25, 

The social theories of Harry Stack Sullivan. 
Dorothy R. Blitsten. New York: The 
ya liam-Frederick Press, 1953. Pp. 186. 


Design for decision. Irwin D. J. Bross. New 
York: The Macmillan Company, 1953. 
Pp. 276. $4.25. 

Current theory and research in motivation. 
Judson S. Brown, e¢ al. Lincoln: Univer- 
sity of Nebraska Press, 1953. Pp. 193. 
$2.00. 

P fessional problems in psychology. Robert 
- Daniel and C. M. Louttit. New York: 
p rentice-Hall, Inc., 1953. Pp. 416. $5.50. 
olitical community at the international level: 
Problems of definition and measurement. 
ey W. Deutsch. Princeton: Princeton 
fe iversity Press, 1953. Pp. 71. 
eps in psychotherapy. John Dollard, Frank 
Auld, Jr. and Alice Marsden White. New 
ras The Macmillan Company, 1953. 

Sty p. 222. $3.50. 
oie” of human personality. H. J. Ey- 
LES . New York: John Wiley & Sons, 
{int 1953. Pp. 348. $5.75. 
num music notation test. Stephen E- 
oh esa New York: Psychological Cor- 
Oration. Pp. 11. 

*™bosium on fatigue. W. F. Floyd and 
a Welford, Editors. London: H. K. 

Wis & Co, Ltd., 1953. Pp. 196. 24s net. 


71 


Psychiatry and military manpower policy. 
Eli Ginzberg, John L. Herma, and Sol W. 
Ginsburg. New York: King’s Crown Press, 
1953. Pp. 66. $2.00. 

Sample survey methods and theory. Volume 
I. Morris H. Hansen, William N. Hurwitz, 
and William G. Madow. New York: John 
Wiley & Sons, Inc., 1953. Pp. 638. 

Sample survey methods and theory. Volume 
II. Morris H. Hansen, William N. Hur- 
witz, and William G. Madow. New York: 
John Wiley & Sons, Inc., 1953. Pp. 332. 
$7.00. y 

How to take a test. Joseph C. Heston. Chi- 
cago: Science Research Associates, 1953. 
Pp. 47. $.40. 

Developmental psychology. Elizabeth B. 
Hurlock. New York: McGraw-Hill Book 
Company, 1953. Pp. 556. $6.00. 

Psychological reflections. C. G. Jung. New 
York: Bollingen Series, 1953. Pp. 342. 
$4.50. 

A court for children. Alfred J. Kahn. New 
York: Columbia University Press, 1953. 
Pp. 359. $4.50. 

Sexual behavior in the human female. Alfred 
C. Kinsey, Wardell B. Pomeroy, Clyde E. 
Martin, and Paul H. Gebhard. Philadel- 

phia: W. B. Saunders Company, 1953. Pp. 
842. $8.00. : 

Hypnotism for professionals. Konradi Leit- 
ner. New York: Stravon Publishers, 1953. 
Pp. 127. $4.00. 

Films in psychiatry, psychology and mental 
health. Adolf Nichtenhauser, Marie L 
Coleman, and David S. Ruhe. New York: 
Health Education Council, 1953. Pp. 269. 
$6.00. 

Applied imagination. Alex F. Osborn. New 
York: Charles Scribner's Sons, 1953. Pp. 
317. $3.79. 

New light on dreams. Max Serog. Boston: 
The House of Edinboro, Publishers, 1953. 
Pp.. 159. $3.00. 

Group relations at the crossroads. Muzafer 
Sherif and M. O. Wilson, Editors. New 


72 í New Books, Monographs, and Pamphlets 


York: Harper and Brothers, 1953. 
379. $3.50. 

Lawless youth. E. A. Stephens. New York: 
Pageant Press, 1953. Pp. 315. $3.50. 

The study of behavior. William Stephenson. 
Chicago: University of Chicago Press, 1953. 
Pp. 376. $7.50. 

Outline of executive development. Lee Stock- 
ford. Pasadena: California Institute of 
Technology, 1953. Pp. 46. $2.00. 

Living with a disability. Eugene J. Taylor 
and Howard A. Rusk. New York: The 


Pp. 


Blakiston Company, Inc., 1953. Pp. 201. 


$3.50. 

The work of a counselor. 
New York: Appleton-Century-Crofts, Inc. 
1953. Pp. 323. $3.00. 

Recruiting the college graduate: A guide for 
company interviewers. Richard S. Uht- 
brock. New York: American Management 
Association, 1953. Pp. 31. $1.25. 

How to help people. Rudolph M. Witten 
berg. New York: Association Press, 19 
Pp. 64. $1.00. 


Leona E. Tyler. 


Journal of Applied Psychology 


VoL. 38, No. 2 


APRIL, 1954 


Personality Test Scores in the Management Hierarchy 


Henry D. Meyer and Glenn L. Pressel 


Stevenson, Jordan & Harrison, Inc., Chicago, Illinois 


wee Se ae purpose of this study was to 
faeces a Beene an indirect but compre- 
Deng industrial validation of the paper and 
poe ie ee test developed by Steven- 
as an hen & Harrison psychologists for -use 
ing a aid in their work of apprais- 
ment ndidates and incumbents for manage- 
cept (Panon in industry. The basic con- 
Was i a in the development of the test 
pene at certain personality traits become in- 
co desirable in incumbents and ap- 
hear as the positions bear increasing re- 
Si i and relative status in the man- 
a fons hierarchy.’ The most obvious test 
mine me of these traits would be to deter- 
sition whether or not the people holding po- 
as a at different management hierarchy lev- 
and whe differences in these same test traits 
cre ether the differences show constant 1n- 
ments as the hierarchy levels increase. 
ai validation study does not make any 
io imination between the competent and in- 
Mpetent person at any given level. Rather, 
A e is made that some complex 
Menka survival and elimination process 1S 
achieve © because consistently fewer persons 
agem e successively higher levels in the man- 
teste Eai hierarchy. If the personality traits 
fact s] are pertinent to such selectivity, that 
trait hould be apparent in the distribution of 
Scores at the various hierarchy levels. 

on is type of “selection” criterion was much 
hat a to the present authors than a 
ju rnin based on some authoritative group $ 
ea of the managerial competence of 
taser toe or managers. Not only did the 
sity of ype of criterion eliminate the neces- 
getting agreement of judges as to what 


ca 
S m Word “management” is used broadly here to 
fr ies all positions above the hourly rate level 
President. engineer, salesman, Or accountant, UP 


ar gda a 


is managerial competence and how it is ob- 
served, but also it allowed the study to pro- 
ceed around the design of a statistical analy- 
sis of previously obtained data from S. jy. & 
H. files. The execution of the study there- 
fore became a formal test of the hypothesis 
that there are trends in personality trait test 
scores as one proceeds from lower to higher 
level positions in the management hierarchy. 


Selection Procedures 


The industrial management hierarchy was di- 
vided into five job levels with officers and gen- 
eral managers at the highest level and hourly rate 
workers at the lowest level. A total of 100 cases 
for each level except at the top? were selected 
from S. J. & H. personnel evaluation test files. 
Each case had been given, at the original time of 
testing, the improved Form B of the Employee 
Questionnaire, the S. J. & H. personality test. 

The Personality Test. The Employee Ques- 
tionnaire, known as the E. Q. Test, is a brief 
industrial personality test developed and de- 
scribed in the literature by previous S. J. & H. 
psychologists headed by H. F. Rothe (1, 3, 4) 
and subsequently improved by increasing the 
number of items from 50 to 75 and modifying 
the trait scoring keys according to the results of 
an item analysis. The tests were scored on 
seven trait keys with 8 to 12 items in each trait 
key and with simple scoring of items without 
weighting. These traits were objectivity, social 
dominance, drive, detail, emotionality, extraver- 
sion (sociability), and (poor) adjustment. Rothe 
(4) has discussed the definition of all of these 
trait terms except detail, which may be defined 
as the liking for detail in work, thought, and 
recreation; the desire to personally take care of 
all the details of projects in which one is in- 
volved. For each trait the mean score and the 
standard deviation for each of the five hier- 
archy levels was computed. 

No claim is made that each trait is a pure, 
unitary factor. Rather, the definitions describe 


es were available from July 1949 to 
February 1952 which filled the requirements of being 
in the top category and taking the improved Em 
ployee Questionnaire, Form B personality test. 


2 Only 57 cası 


Research i 
NG POLLEGE | à 


74 Henry D. Meyer and Glenn L. Pressel 


traits that are felt to be relevant to management 
success. The major intercorrelations among traits 
for 161 cases where objectivity is held constant 
at a median score are as follows: social dominance 
and extraversion, r = .78; adjustment and emo- 
tionality, r= .67; detail and emotionality, r= 
-66; adjustment and detail, r=.52; detail and 
drive, r = .46; adjustment and social dominance, 
r =— .32; emotionality and drive, r=.28; ad- 
justment and extraversion, r = — .26; and extra- 
* version and drive, r = .25. : 

A factor analysis * of the same data, i.e., with 
objectivity held constant, reveals two major clus- 
ters and one minor cluster. The first cluster is 
social dominance and extraversion; the second, 
detail, emotionality, and adjustment; and the 
minor one, drive. 

Categorizing the Hierarchy. On the basis of 
the senior author’s experience in consulting with 
industrial concerns at all levels of management, 
the hierarchy was broken down into five grades 
of job status according to job titles as follows: 

I. President, Vice President, Treasurer, Gen- 
eral Manager, General Sales Manager and Ex- 
ecutive Engineer, : 

II. Works or Plant Manager, Sales Manager, 
Chief Engineer, Chief Industrial Engineer, Con- 
troller, Industrial Relations Director, Purchasing 
Director. 

II. Production Superintendents, Industrial 
Salesmen, Sales Engineers, Department or Sec- 
tion Heads in Accounting, Industrial Engineer- 
ing, Design Engineering, Inside Sales, Purchas- 
ing and Personnel, 

IV. Production Foremen, ‘Accountants, Design 
and Process Engineers, Time Study and Produc- 
tion Control Men, Sales Correspondents, Jr. In- 
dustrial Salesmen, Personnel Men. 

V. Clerks and Factory Workers. . 

This breakdown of job status into hierarchy 
levels was reviewed and accepted by a supervis- 
Ing engineer of the S. J. & H. engineering staff 
as a rough approximation to the general trend in 
manufacturing industry insofar as one existed. 
This breakdown was used as a guide in sorting 
the actual cases into five hierarchy groups. 
While it was recognized that job titles are no 
guarantee of specific job content nor always a 
true reflector of the actual status of the job in 
the management, it was felt that such a break- 


down came as close as was possible to a general. 


criterion of management status. In any event, 
the selection of the cases followed procedures 
more elaborate than merely reading a job title 
as will be shown in the next section. These job 
titles were the convenient way of expressing dis- 
tinctions in status that derived from the senior 
author’s consulting experience and cannot be de- 
fended any further than that. 

Selection and Placement of the Cases. In the 
selection of the cases, the Major attempt was to 
obtain purity of hierarchica] and occupational 


3 The authors are indebted { vernon Keenan for 
making this factor analysis, + € 


classification with as wide a variation in occupa- 
tion and company affiliation as permitted by the 
case history file. In studying cases for place- 
ment, the man’s whole work history was Te- 
viewed. The criteria for selection and place- 
ment in a hierarchy were: í 
1. That the person be now, or last, employed | 
at a job clearly recognized as belonging to a spe- 
cific grade in the hierarchy or that his employ- 
ment record indicate that he had consistently. 9 
steadily held such a job or jobs in the past. J 
2. That his employment record indicate a con: 
sistency of occupation and that he had proceede 
through a normal job succession up to the JO 
according to which he was classified. 7 
3. That if the occasion for the testing was w 
apply for a position, the job applied for be as 
the hierarchy level indicated by his previous €™ 
ployment. free 
In reviewing the cases, the senior author i, 
quently was able to bring a personal knowle oA 
of company size and organization structure , 
bear upon the information provided in the 1% 
history. Also, since all of the cases had to i 
been tested since July of 1949, to have taken the 
improved E. Q. Form B personality test, of 
senior author had interviewed the majority nd 
the cases himself regarding their job duties be 
histories. As a result of this knowledge of co f 
pany and job, a more consistent selection 
typical cases for each grade was obtained Her 
could have been obtained from job titles alos a 
No attention was paid to test results in selec 
ing cases, nor was any consideration given of 
Whether the person was appraised as supera 
average, or inferior. In fact, many of the €a 
had to have their personality tests rescore en 
the improved key + after they had been chos 
for the study. of 
The attempt to secure 100 cases for each f 
the five categories resulted in some stretching 
the criteria in categories I and II of officers few 
second level executives where for the last ious 
cases some men were chosen where the previ 
employers were not well known and where the Jeh 
applied for was lower than the category 1n uy em 
the man was placed by reason of previous 
ployment.’ A 
Also it should be noted that there was 2° the 
tempt to control company size or to modi Mi 2 
status of a job according to the size of the Che 
pany. Companies of all sizes are listed i”. 


anies a 
employment histories of the cases selected. H 


jte 
+The scoring key was revised following the 
analysis. 3 tre 

5A detailed summary of the present or mos jer” 
cent employment of all cases chosen for each met” 
archy category has been deposited with the A Nov 
can Documentation Institute, Order Docume” jects 
4191 from the ADI Auxiliary Publications Posh- 
Photoduplication Service, Library of Congress: for 38 
ington 25, D. C., remitting in advance $1.25 opic® 
mm. microfilm or $1.25 for 6X8 in, phot “cation 
Make checks payable to Chief, Photodupli 
Service, Library of Congress. 


$$, —_—______—. 
> 


ei i A re te 
ee 


Personality Test in Management Hierarchy : 75. 


Table 1 


E. Q. Test Trait Score Means and Sigmas 
According to Hierarchy Level 


7 Fei 
WA Hierarchy Level 
Group I IL ul IV Vv 
E. Q. Trait M SD. M S.D. M SD. M SD. M S.D M S.D 
ie) Fy a 
oe. x 44 19 43 19 4.2 20 42 19 46 18 45 18 
—e minance 6.7 2.2 7.2 2.0 7:5 2A 6.7 2.2 64 2.3 6.1 2.2 
aoe rsion 39 18 3.7 2.0 39 1.7 4.0 19 4.0 1.7 3.8 17 
a 5.9 18 5.6 1.7 5.8 19 6.0 19 6.2 17 + 60 19 
a 43 19 3.2 1.6 39 19 44 19 4.7 2.0 49 1.9 
Acne ity 4.0 23 3.0 19 37 23 3.723 43 2.5 49 2.5 
ment (poor)** 3.6 2.1 28 18 3.2 18 3.7 2A 3.8 2.2 44 23 


"a 
N = 100 for levels II, IL, IV, and V. 


N = 57 for level I. 


tto 
Significant at 5% level for single classification F test. 


e 
pe) most of the companies in which the persons 
Were Presently employed or seeking employment 
ee ee companies, Or medium-sized 
Would or large companies. Typical employment 
plants in between 500 and 1,000 for the smaller 
or i, in the group and between 2,000 to 3,000 
Stati larger plants. +. 
cedur istical Procedures.’ The statistical pro- 
results of the present study were based on the 
Tando of a pilot study of 200 cases chosen at 
of the h The pilot study was a preliminary test 
sonality ypothesis of a relationship between per- 
evels 4 trait scores and management hierarchy 
eee to determine whether or not the 
scal esis had sufficient merit to warrant a full 
peor study. 
trends results of the pilot study indicated that 
Pro re the E. Q. test traits by hierarchy level 
tala, existed for the social dominance, Ce 
Well Mg emotionality traits and that there might 
4 continuous increment trends. An analy- 


sis i naly 
the i the differences between the means within 


e ae showing trends indicated that it would 
compares, to have 100 cases at each 0 e 
diea hierarchy levels in order that mean 
tistical „of the magnitude observed be sta- 
Was dos, significant. Hence, the major study 
Seedy ie, Mith 100 cases in each hierarchy level 
rently level I, for which only 57 cases were Cur- 
the available. All subsequent results are from 


Major study. 


Results 


A me eget for Trait Score Trend Validity. 
Elona. classification F test for hierarchy 
e ie used to determine the validity of 
Sts “Ngee trend for each trait and at 
differe as used to establish the validity of 
tre nces between trait score means at the 
mes of the hierarchy for each trait. 


These tests demonstrated significant hier- 
archy trends for social dominance, detail, 
emotionality, and adjustment, but failed to’ 
reveal significant hierarchy trends for objec- 
tivity, extraversion, and drive. Table 1 
shows the trends in trait score means by hier- 
archy level for the major study. 

The Test for the Independence of Hier- 
archy Trends. The establishment of hier- 
archy trends in several trait scores did not 
prove that hierarchy produced them: inde- 
pendently of other variables. There could be 
other variables producing trait score trends 
which are associated with differences in hier- 
archy level. Age and education are certainly 
associated with differences in hierarchy level 
and might also produce trait score trends. 
Selective sampling by occupation might have 
occurred such that the hierarchical trends ob- 
served might have been due to trait score dif- 
ferences related to occupations rather than to 
hierarchy. Also, the present study and previ- 
ous studies (4) indicated that differences in 
objectivity trait scores were known to pro- 
duce differences in the four trait scores hav- 
ing hierarchy trends to as great an extent as 
hierarchy levels, particularly in the emotion- 
ality and adjustment traits. Fortunately, the 
objectivity score means were approximately 


the same for all five hierarchy levels in the 


present study. 
The first step in testing the independence 


of hierarchy as the trend producing variable 
was the analysis of the data to see if there 


76 Henry D. Meyer and Glenn L. Pressel 
Table 2 
E. Q. Test Trait Score Means and Sigmas 
According to Age Level 
Age Level 
Yı Years Years Years 
20-30 30-40 40-50 50+ 
N=87 N=171 N = 152 N=39 
E. Q. Trait M SD. M SD. M $D M SD. 
Objectivity 47 20 35 2 41 19 4.3 a 
Social Dominance 6.8 2.4 6.6 2.4 7.0 22 6.6 = 
Extraversion 4.2 1.7 4.2 2.8 By 1.7 3.4 1. 
Drive 62 16 59 19 6.0 20 5.5 16 
Detail 48 18 44 20 41 19 39 19 
Emotionality 47 24 41 25 35 22 3.7 sa 
Adjustment (poor) $8 22 38 26 34 20 37 22 


were observable trends in the alternative 
variables noted, i.e., age, education, and oc- 
cupation. These results are shown on Table 
2 for age, Table 3 for education, and Table 4 
for occupation. 

This analysis indicated that the hierarchy 
trend for adjustment trait scores might have 
been due to education alone and that the 
trends for detail and emotionality trait scores 
might have been due to both education and 
age. Occupational differences were too small 
to admit of much possibility for causing the 
observed hierarchy trends. 

The second step in testing the independence 
of hierarchy as the trend producing variable 
was a double classification F test analysis of 
variance pairing hierarchy with each of the 
variables—age, education, occupation, and 
objectivity... This would show, if the data 
were adequate, whether trait score trends for 
hierarchy were independent of trait score 
trends for these four variables, co-existent 
with them but still independent, or interact- 
ing with them to produce the over-all effect 
labeled “hierarchy trend.” 

Several difficulties occurred in the execu- 
tion of this procedure because of the unequal 
distribution of two of the four variables 
throughout the hierarchy. Only two gradua- 
tions of age could be used—30-40 years and 
40 years and up, because no cases under 30 
years were in hierarchy levels I and II, and 


6 Because the subclasses contained widely differing 
numbers, the “disproportionate subclass method” for 


treating data classified in unequal numbers of items 
was used (4, 235-240), 


few in III. Also, only three hierarchy levels 
could be paired with occupation because oH 
cupational specialization is infrequent in hie 
archy I, which is composed of cases in gen 
eral management at the officer level and bé 
cause at level V, the lowest in the hierarchy: 
the five occupational groups of production ae 
pervision, sales, accounting, design engine 
ing, and industrial engineering, give way of 
hourly rate jobs with skill, craft, clerical, 
service classifications. aaa 

This shrinkage in the number subclassific? 
tion for one of the paired variables combi”! in 
with the shrinkage in the number of cases 
each cell because of the failure to use all 
the data in the age and occupation panom 
with hierarchy made it much more difficult n 
establish statistically significant independen 
trends with these two variables than with ity: 
other two variables, education and objecti” 
As a result of these technical difficulties; c 
double classification analysis of variance tê 
nique used in this study is thought tO isy 
valid only on the positive side. That ws 
where the double classification F test on 
the independence of the hierarchy trend fr p 
one of the four paired variables for & gs 
trait, that can be accepted as proof posit e 
of independence. But where independe? 
of hierarchy trend is not shown by this pe 
nique, the matter is still open to subsea” ye 
proof or disproof using more cases an! med 
numerous subclassifications of the Pê 
variables, 


«yi 
The double classification F test P” 


| 


Personality Test in Management Hierarchy 


T, 


Table 3 


E. Q. Test Trait Score Means and Sigmas 
According to Education Level 


Education Level * 


High 2 Years College 

School College Graduate 

N= 179 N= 85 N= 193 
E. Q. Trait M S.D. M S.D. M S.D. 
Objectivity 44 1.8 4.7 1.9 4.3 1.9 
Social Dominance 6.5 2.3 6.4 2.1 7.2 21 
Extraversion 3.8 1.7 3.7 1.7 4.2 1.8 
Drive 6.2 1.9 59 1.5 5:7 13 
Detail 4.8 1.8 44 1.8 3.9 1.8 
Emotionality 4.6 2.3 3.7 2.5 3.6 2.2 
Adjustment (poor) 44 2.2 3.4 2.0 3.0 2.0 


* High School = High School graduation or less; 2 Years College 


» 4 or more years of college. 


hierarchy in turn with age, education, objec- 
tivity, and occupation failed to demonstrate 
any significant trend independence for drive 
and extraversion. Adjustment was shown to 
oe significant trend independence at the 
A level or better when hierarchy was paired 
ality age, education, and objectivity; emotion- 
T and social dominance had significant 
rend independence when paired with age 
ind education; detail had significant trend 

lependence when paired with education and 
Objectivity, 

The complete absence of independent trends 
Or the occupation variable when paired with 


= 1 or 2 years college; and College Graduate 


hierarchy is probably due to the small vari- 
ance of trait scores among occupations. The 
complete absence of independent hierarchy 
trends when paired with occupation must be 
discounted as possibly due to a limited num- 
ber of hierarchy subclasses in this pairing, 
classes II-IV. Also, the absence of some ex- 
pected independent trends for the age, edu- 
cation and objectivity variables must be dis- 
counted as possibly due to the limited num- 
ber of subclassifications for these variables— 
2 for age and 3 for education and objectivity 
as compared with 5 for hierarchy. Keeping 
the above technical limitations in mind, cer- 


Table 4 


E. Q. Test Trait Score Means and Sigmas 


According to Occupation 


Occupation * 


A B c D E 
N=63 N = 36 N=35 N= 32 N= 84 
7 / 
E. Q. Trait M SD. M SD. M SD M SD . M SD 
svieetivity 41 18 48 19 46 19 as ad. 43 20 
cial Dominance 63 19 67 23 71 2.1 67 22 67 24 
Xtraversion 39 11 go id 44 18 39 i A 8 
P 60 22 57 18 63 17 53 18 3 17 
Detail ža a7 g5 2.2 4g 17 43 18 43° 24 
Motionality 40 23 41 24 ph 2.2 3.6 23 39 25 
S Mstment (poor) 41 20 38 2.3 3.7 21 sa II 3.9 2.2 
= Salesmen; D = Accountants; E = Industrial 


* 
Rina = Production Supervisors; B = 
uction) Engineers. 


Design Engineers; C 


73 Henry D. Meyer and Glenn L. Pressel 


tain statements may be made about the ob- 
served valid trends in E. Q. test trait scores 
previously presented in Table 1. 


Summary of Primary Results 


1. The trend for higher social dominance 
trait scores as the hierarchy ascends is: (a) 
independent of age; (b) independent of edu- 
cation; (c) not proven to be independent of 
objectivity; and (d) not proven to be inde- 
pendent of occupation. 

2. The trend for lower detail scores as the 
hierarchy ascends is: (a) not proven to be 
independent of age; (b) the result of interac- 
tion of hierarchy and education even though 
there is some other quantitative degree of the 
hierarchy trend which is independent of edu- 
cation; (c) independent of objectivity; and 
(d) not proven to be independent of occupa- 
tion. 

3. The trend for lower emotionality scores 
as the hierarchy ascends is: (a) independent 
of age; (b) independent of education; (c) 
not proven to be independent of objectivity ; 
and (d) not proven to be independent of oc- 
cupation. 

4. The trend for lower, i.e., better adjust- 
ment scores as the hierarchy ascends is: (a) 
independent of age; (b) probably independ- 
ent of education (very close to 5% level of 
confidence) although there is a similar reduc- 
tion trend with increasing education that is 
independent of hierarchy; (c) a result of the 
interaction of the hierarchy and objectivity 
variables even though there is some other 
quantitative degree of the hierarchy trend 
which is independent of objectivity and some 
objectivity trend Probably independent of 
hierarchy (very close to 5% level of confi- 
dence); and (d) not Proven to be independ- 
ent of occupation. 

Secondary Results, A number of trends in 
trait scores were observed for the variables 
of age, education, objectivity, and occupation 
alone. Of these, only age was tested for va- 
lidity of trend by a single classification F test 
analysis of variance. This was done because 
the double classification F test pairing age 
with hierarchy was felt to be inadequate be- 
cause the full age range of the data could not 
be used. The single classification F test for 
age alone showed a valid trend, at better than 


oll 
the 5% level of confidence, for detail trait 
score means of successively older age gronh 
to decline. The previously observed mag 
for extraversion (sociability), emotionality 
and drive trait scores to decline with inca 
ing age (Table 2) were not marked eno i 
to prove themselves valid at the 5% leve 
confidence. ‘on 

Increasing amounts of formal educa 
gave trait score trends of lower (poor) aa 
justment, emotionality, detail, and drive rad 
score means (Table 3). These trends va 
not tested for validity by the single class! n 
cation F test because the double classifica 
F test pairing education and hierarchy “a 
thought to be adequate. As indicated pre A 
ously, only the trend for adjustment score ‘ 
decline with education proved valid at 
5% level of confidence. ans 

Occupational differences in trait score Wore 
occurred but could not be called trends ( 
ble 4). The differences were not great eno na 
and the number of cases in each occupatos 
group was too small to establish their va jn 
by statistical procedures. Salesmen an o 
dustrial engineers were highest in tais 
extraversion (sociability) and drive. : a 
engineers. were lowest in extraversion ow 
highest in objectivity. Accountants were è 
est in objectivity and (poor) adjustment, 
latter probably because of the former. | ct 
duction supervisors were highest in a bas 
ment. There were no marked alte e 
among the five occupational groups SPor 
gated in the trait scores of detail or € 
tionality. oup 

Decreasing objectivity trait score gis jo? 
gave trait score trends of higher extrave oot) 
and lower detail, emotionality, and al 
adjustment trait scores. Only the (poor? gif 
justment trait score trend for objectivity fice 
ferences proved valid in the double class i 
tion F test pairing objectivity and hiera ivit 
The use of a larger number of objec” pe 
subgroupings and more cases would Þe pi 
quired to clarify whether the traits of valid 
nance, detail and emotionality also havé 
trends with objectivity, 


Discussion 


t0 
46 
Inasmuch as two of the failures of arti 
establish the independence of their hie 


s 


in tra; 
l trait scores were greater than 


Personality Test in Management Hierarchy 79 


` trends occurred when hierarchy was paired 


with objectivity, it is of utmost importance 


to recognize that objectivity trait score means . 


ae practically constant for all five grades 
of the hierarchy. Hence, while it can be 


Concluded that hierarchy trends will not oc- ` 


cur in emotionality or dominance trait scores 
When dealing with only high or low objec- 
tivity score groups, it can be said that these 
effects will cancel each other out in a ran- 
domly selected sample that gives a normal 
distribution of objectivity scores. Therefore, 
ee the present study, where a normal dis- 
Hutton of objectivity scores occurred at all 
ae archy levels, emotionality and dominance 
rait score trends appeared which were not 
due to objectivity differences among the hier- 
archy grades. 

The hierarchy trend for detail trait scores 
cannot be separated from the age variable 
at the present time because of the lack of 
Younger people in the upper hierarchy levels. 

owever, age is also closely related to ad- 
ministrative and managerial experience in the 
Present sample and such experience could be 
GR true variable associated with the detail 
ae hierarchy trend. A control study, keep- 

g age constant at 35 to 45 years with the 

at each hierarchy level ranging from 27 to 
a Cases gave a consistent hierarchy trend for 
el of about the same magnitude as in the 

ajor study (Table 1) in which age was un- 
Controlled. Hence the total evidence favors 

independence of the hierarchy trend for 

© detail trait from the age variable, if not 
pt ‘Managerial experience. 
eit is also apparent that occupational influ- 
4 es on trait scores overlie hierarchy influ- 
Nees and in a few cases may exceed them. 
or example, junior salesmen had higher 
o ainance scores than sales managers in our 
mall sample of the sales occupation. Fur- 
termore, the fact that hierarchy differences 
occupational 
ifferences points up the fact that vocational 
pater for adults, relative to tested per- 
sion ALY traits, has a hierarchy Jevel dimen- 
oce Which mày be more discernible than the 

Upational dimension. 
xtremely interesting to the authors is the 
that two of the four traits showing vali 


fact 
i -rali 
*tarchy trends, i.e., detail and emotionality, 


showed the least differences among occupa- 
tional groups of all seven traits. Only in po- 
sitions deemed administrative, such as gen- 
eral managers, works’ managers, and indus- 
trial relations directors did a sharp reduction 
in emotionality and detail trait score means 
become apparent. This suggests the hypothe- 
sis that hierarchical differences are primarily 
differences. in the breadth and generality of 
administrative responsibilities. The hierarchy, 
according to this hypothesis, proceeds from 
specific occupational activities to the adminis- 
tration of specific occupational activities, to 
the administration of more diverse and more 
generalized occupational activities; and rising 
in the hierarchy is favored by -personality 
traits suitable in degree and kind to such ad- 
ministrative responsibilities. 

As a precaution against overgeneralizing the 
results of this study, it should be remembered 
that at every hierarchy level there was a 
nearly normal distribution of scores for every 
trait and that the observed trait trends were 
only small changes in the central tendencies 
of these distributions. That is why it was so 
difficult to establish the validity of these 
trends. It took five hierarchy levels with 
approximately 100 cases in each to do it. 
The wide dispersion of scores could be due to 
the fact that there are many other probable 
determiners of hierarchy-climb survival or 
achievement than personality trait scores. 
Intelligence, experience, education, political 
skill, competition, motivation, values, etc., 
are other variables that come to mind. Con- 
sidering these many probably contributing 
variables, it is remarkable that a brief pencil 
and paper personality test could show con- 
sistent and valid hierarchy trends in four of 
its seven trait scores. It should also be noted 
that of the two traits with the highest inter- 
correlation, social dominance and extraver- 
sion (sociability) r= .78, only one, social 
dominance, showed a hierarchy trend. This 
throws doubt on the validity of Ellis’ (2) 
implied criticism that personality inventories 
with overlapping traits are undesirable. 

Also, it should be noted with caution that, 
the present study does not offer any direct 
evidence as to whether possession of these 
“trend” trait scores on the “high” side of the 


` distribution at a single level of the hierarchy 


80 


indicates that their possessors are more com- 
petent in their jobs than people on the “low” 
side within that same hierarchy level. Rather, 
the evidence is all indirect and follows a “sur- 
vival” concept based on the belief that there 
is a progressively more stringent selection for 
fewer and fewer jobs as the hierarchy ascends. 
Since the variance in four trait scores has 
been demonstrated to have hierarchy “sur- 
vival” value by this study, it can be con- 
cluded that this study has been, to some de- 
gree, successful in indirectly validating S. J. 
& H.’s E. Q. Personality test for use as one 
tool among several in their appraisal of candi- 
dates and incumbents for management posi- 
tions. Also, in a limited way, it has con- 
tributed in the broader task of isolating the 
characteristics of industrial managers. It re- 
mains for a future study to determine whether 
there are additional hierarchy trend traits and 
whether all trend traits “develop” in their 
possessors with job experience or exist full 
blown from early adulthood. 


Summary 


The traits of (poor) adjustment, emotion- 
ality, detail and social dominance as measured 
by Form B of the Employee Questionnaire, a 
brief industrial personality test developed by 
Stevenson, Jordan & Harrison psychologists 
(1), were found to have valid management 
hierarchy trends. The traits of extraversion 
(sociability ), drive, and objectivity did not 
have valid hierarchy trends. There was no 
rating criterion of success or failure for the 
cases studied. Rather, current achieved po- 
sition. in the hierarchy was the implied cri- 
terion since the cases studied held jobs 
particular level in the hierarchy. 

The criterion of validity for trend was a 
single classification analysis of variance of 
trait scores for the five hierarchy categories 
giving an F ratio at the 5% level of confi- 
dence or better. Also, validity was demon- 
strated by a ¢ test of the significance of dif- 
ferences between trait score means of the top 
and bottom hierarchy categories giving a ¢ 
ratio at the 59% level of confidence or better. 
The valid trends were such that detail, emo- 
tionality and (poor) adjustment trait score 
means decreased at each successively higher 
level of the hierarchy while social dominance 


ata 


Henry D. Meyer and Glenn L. Pressel 


trait score means increased as the hierarchy 
levels became successively higher. 

The industrial management hierarchy was 
divided into five levels with company oinen 
and general managers at the top level ES 
hourly rate employees at the bottom level. 
One hundred cases at each of the five levels 
except the top level, which had 57, were uti- 
lized in the study. 

The valid hierarchy trends for the a 
traits mentioned were found by a double 
classification analysis of variance techniqué 
to exist independently of the variables of a8° 
education, and objectivity at the 54% level K. 
confidence or better with the following excels 
tions. The detail trait trend was not proven 
to be independent of age. The social dom! 
nance and emotionality trait trends were Ta 
found to be independent of objectivity. T 
latter exception was not held to be a w. 
defect in demonstrating hierarchy trends 4 
cause objectivity trait score means were P! 
tically equal at all five hierarchy levels. acl 

Miscellaneous trait trends were observei 
for the variables of age, education, on 
tivity, and occupation alone. But these be 
not a major aspect of study and were nA 
tested statistically. Hierarchy differences 
personality trait scores were generally grea 
than occupational differences, 7 api 

For all traits there was a substantial m 
normally distributed dispersion of ee 
around the mean at every level of the hi 
archy indicating the probable participation st 
many variables in addition to personality 
Scores in determining the hierarchy leves 
the cases studied. 


Received May 27, 1953, 


References 


A ol 

1. Carr, E. R, and Rothe, H, F. Validity of ® fiy 

jectivity key on a short industrial perso” g4, 

questionnaire. J. appl, Psychol, 195% 

178-181, aai 

- Ellis, A. Recent research with personality be j 

tories. J. consult, Psychol., 1953, 17, 4° of # 

- Mitchell, M. B, and Rothe, H. F. Validity © oy 

emotional key on a short industrial pegs 

ality questionnaire. J, appl. Psychol ; 

34, 329-332, any 

4+. Rothe, H. F, Use of an objectivity Re og: x 
short industrial Personality question” 

appl. Psychol., 1950, 34, 98-101. pa 

- Snedecor, G, W, Statistical methods (ath yo 
Ames, Iowa: Iowa State College Press 


ene 


N 


uw 


Tue Journ, 
Vol. 38, Ne phere PSYCHOLOGY 


Temperament Measures in Industrial Selection * 


Frederick Herzberg 


Psychological 


es have reached an advanced 
whl cn i development of test measures 
dustrial A e applied to the problems of in- 
these ex selection. The least dependable of 
Sonate ipa lies today in the area of per- 
of dee and temperament assessment. One 
which Rie at in for the wariness with 
Diament ustrial psychologists approach tem- 
Sieh test inventories is the transparency ot 
iityto oe their corresponding amena- 
a desired aking or pointing answers to achieve 
strated gi , Many studies have demon- 
occur (1 z possibility that this faking can 
e A 3,4, 5). These studies, however, 
tions generally been based on artificial situa- 
Pa in which college students have been in- 
ucted to attempt such faking. 
ma mey ask, as Guilford does, in his 
Petane for the Guilford-Zimmerman Tem- 
S ANS Survey, whether such faking or 
if ale occurs in the actual situation. And 
g does occur, to what extent does it 
trial e or limit the use of the test for indus- 
al selection purposes? 
oo hypotheses were examined 
at relating to the existence 1n the employ- 
Sic situation of such manipulation of test 
die The first hypothesis is that the 
ii ution of the Guilford-Zimmerman Tem- 
siete scores will be significantly higher 
t eager tested in the industrial situation 
le the distribution of scores for either col- 
8e students or of clients seen for vocational 
Suidance. 


os a, here that these three 
this t ree different motivations for j 
grou est. The motivation for the industrial 
motivati to get a job or to get promoted. The 
jects o ìon of the vocational counseling sub- 
èir E eand is to gain information about 
College a ities and job opportunities, while the 
students’ reason for taking the test 1S 


* This 
research was rted by 
uh] Fotitidation, as supported by 


in this 


groups 
taking 


a grant from the 


81 


Service of Pittsburgh and University of Pittsburgh 


basically an academic one of pleasing the in- 
structor or participating in an experiment. 
The second hypothesis is that in industrial 
testing, where faking is expected to occur. the 
educational level of the examinees will affect 
the extent of such faking; i.e., the higher the 
education the higher will be the score dis- 
tributions. This is suggested by the fact that 
the higher educational groups have more gen- 
eral intelligence to understand the implica- 
tions of the items and more test sophistica- 
tion from their longer academic experience. 
The Guilford-Zimmerman test was chosen 
for this study because it is one of the most 
widely used personality inventories of the 
non-psychiatric type. Its avoidance of psy- 
chiatric terminology and goals make it more 
applicable in industrial personnel work. 


Method 


Population. The industrial group (those 
tested for employment, promotion, or com- 
pany personnel survey purposes) consists of 
a total of 924 cases, of which 338 are college 
graduates, 128 have had 1-3 years of college 
education, 353 are high school graduates, and 
105 have only elementary school education. 

The self-referral group (vocational guid- 
ance clients) contains 94 college graduates 
and 56 high school graduates. f 

The college group (University of Pitts- 
burgh students in Introductory Psychology 
classes) consists of a total of 109 students 
approximately equally distributed among the 
four years of freshman to senior. 


All subjects are males. 
Analysis. Frequency distributions and basic 


distribution statistics were computed for each 
of the “motivation” and educational groups. 
There was considerable skewness in many of 
the distributions with corresponding unequal 
variability between the comparison groups. 
All distributions were unimodal and plots 
owed that the differences between groups 


sh 
- in higher scores for one distribution as 


lay 


-82 F; rederick H erzberg 


Table 1 


Summary of Significant Differences Between Means of Guilford-Zimmerman Scales for Groups Studied 


Industrial 


Self-Referral Pitt Freshmen Total maus 
College College and Pitt G BET 
Graduates Graduates Sophomores Sample rara j 
vs. : 
Taduetrial Self-Referral Pitt Juniors Industrial Self- Referral 
High School High School and Non-College G S 
Scales Graduates Graduates Seniors Graduates Ti 
+t i 
Gen. Activity 2 
Restraint = i 
Ascendance “i * a 
Sociability a ee S 
Emot. Stability s + 
Objectivity + ek í 
Friendliness ** 
Thoughtfulness 4: 
Personal Rel. bie ek 
Masculinity +*+ 


a 


* Difference between means significant 
** Difference between means significant 


Opposed to the other, 
tween the means of the various groups were 
tested for significance by Student’s “t” ratio, 
The .01 level of confidence was accepted. 
There were significant age differences be- 
tween some of the groups but this was proved 
-not to be a pertinent variable in this study. 


The differences be- 


Results 


merman scales. 
A comparison on 


the basis of education 
within the industria 


1 group shows higher 
scale on the Guilford- 


grammar school throu 
lege graduation. 
ences between high school and college edu- 
cation are significant at the .01 level of 
confidence are Ascendance, Sociability, Emo- 
tional Stability, 
tions, and Masculinity. AJl these differences 
with the exception of M; asculinity are signifi- 
cant at beyond the .001 probability level. 
Two differences at the -O1 level (Odjec- 
tivity and Personal Relations) occur between 
college graduate self-referrals and the high 


at .05 level of confidence, 
at .01 level of confidence. 


school graduate self-referral sample. ont 
for the Masculinity scale is there a sie ia 
cant difference between University of ae 
burgh freshmen and sophomores and en i 
burgh juniors and seniors. All three di p 
ences are again in the direction of. até t 
means for the higher education groups. ula 
In order to compare the industrial poP the 
tion with an academic motivation grouk ef 
norms provided by Guilford could have -ibai 
utilized, However, since our industrial 5 e 
jects are from Western Pennsylvania an" oy. 
manual norms are based upon’ California rm5 
lege students, it was decided to gather Poa 
on an equivalent Pittsburgh college PoP re 
tion. University of Pittsburgh norms e- 
found to be essentially similar to tho a 
Ported by Guilford with the exception Ois. 
higher Masculinity score for Pitt stude” 7, 


The Pittsburgh college group was then ar 


pared with non-degree college education, eD 
dustrial cases, This latter group was © det 


a Broa in oF 
for comparison with the Pitt students in ° 
to equate for t 


found to be of 
subjects, 

gree colleg, 
approxima 
high. scho 


he education level which sia! 
significance for the indus yer : 
The norm values for the no” be 
e industrial sample are found 5 fo! 
tely midway between the nor ate | 
ol graduates and college gra / | 


Temperament Measures in Industrial Selection pass 


Significant differences at the .01 or better 
level of confidence differentiate these two 
groups in favor of the industrial population 
on the Restraint, Sociability, Emotional Sta- 
bility, Objectivity, Friendliness, and Personal 
Relations scales. A higher Ascendance mean 
was significant at the .05 point. 

Comparing the college graduate industrial 
group with college graduate counseling clients, 
we find the means of General Activity, As- 
cendance, Sociability, Emotional Stability, 
Objectivity, and Personal Relations scales all 
to be significantly different. These differ- 
ences are once more in the predicted direction 
of higher scores for the industrial cases. 


Discussion 


These results support both of the hypothe- 
ses stated in the introduction. The industrial 
Population for equivalent educational level 
have higher means on most of the scales of 
the G-Z than do corresponding academic and 
counseling client samples. In addition, the 
educational differences occurred primarily 
with the industrial samples. The hypothesis 
that faking or pointing of personality tests 
does actually occur in the industrial situation 
is well substantiated by these data; first, by 
their higher scores, and second, by the rein- 
forcing of this finding with the educational 
differences obtained. It seems reasonable 
therefore to conclude that clients for employ- 
ment or promotion do fake their test re- 


sponses and this occurs to a greater extent at 
the higher educational levels. 

As to the question raised in the introduc- 
tion regarding the significance of such point- 
ing on the usefulness of the test, one .need 
only examine norms based upon a college 
graduate industrial sample. For the Socia- 
bility scale, the median will fall at a score of 
25 on a 30 item scale, i.e., one-half of the 
group will achieve scores of five-sixths or 
more of the possible number of items in- 
cluded in that area. Similar results are found 
for the Emotional Stability, Objectivity, and . 
Personal Relations scales. 

Perhaps the nature of the distribution of 
G-Z scores which are obtained in employ- 
ment testing is best illustrated by presenting 
the percentile ranks of scores for the groups 
studied which are equivalent to the medians 
on the manual norms. These percentile ranks 
appear in Table 2. 

The median scores, for example, on the 
published norms for the Emotional Stability 
and Personal Relations scales fall at the fif- 
teenth percentile when computed from a sam- 
ple of industrial college graduates. The other 
scales show similar discrepancies in the me- 
dian values. The equivalence of the Pitt 
sample to Guilford’s California college popu- 
lation is shown in the last column of Table 2. 

With such extreme “piling-up” it is diffi- 
cult to conceive of the meaning of a high 
score on these scales, much less to utilize 


Table 2 


Percentile Ranks* of Scores for the Groups Studied Which Are Equivalent to th 


e Medians on the Manual Norms 


Self-Referral Self-Referral 


oan Lie College High School Pitt 
Scales Picea) ee Graduates Graduates Students 
55 50 
40 65 + 5 0 
: > 30 40 50 is 
A 40 50 60 K 
7 x j 0 60 50 
25 40 ; 
0 > 45 40 60 55 
F ee 35 45 50 55 
T 53 7 40 50 45 
P e 30 30 55 55 
M F 50 55 50 55 


* To the nearest fifth percentile. 


84 Frederick Herzberg 


them. When one considers the curvilinear 
use of these tests, as recommended by Guil- 
ford, he would have to reject half the appli- 
cants on those scales where a high score is 
considered a drawback. 

These extreme results apply mostly to the 
use of the test with a college graduate indus- 
trial population. But this is just the popu- 
lation wherein the need for such a tempera- 
ment evaluation is greatest. The top level 
jobs involving supervision and personal rela- 
tions usually are held by college graduates, 
increasing the need of some assessment of 
their personality characteristics. 


Received May 14, 1953. 


References 


1. Cofer, C. N., Chance, June, and Judson, A. J. A 
study of malingering on the MMPI. J. Psy- 
chol., 1949, 27, 491-499. : 

2. Hunt, H. F. The effect of deliberate deception 
on Minnesota Multiphasic Personality Inven- 
tory performance. J. consult. Psychol., 1948, 
12, 396-402. 

3. Kimber, J. A. M. The insight of college students 
into the items on a personality test. Educ. 
psychol. Measmt., 1947, 7, 411-420. b 

4. Longstaff, H. P., and Jurgenson, C, E. Fakability 
of the Jurgenson Classification Inventory- 

appl. Psychol., 1953, 37, 86-89. F 

- Wesman, A. G. Faking personality test scores 
a simulated employment situation. J. appl. 
Psychol., 1952, 36, 112-113. 


wn 


Tue Journar 
Vol. 38, No. Se aaa PsycHOLocy 


A Validation Study of the Worthington Personal History Blank 


John G. Clark and W. A. Owens 


Iowa State College, Ames, Iowa 


Nig oe Personal History Blank 
Pa a ter, —PH) consists of an unstructured 
ea application blank which is used as a 
7 jective technique in industrial selection. 

me evidence for the validity of PH has ap- 
a the form of “testimonials” from 
sorabi, users. Most of these have been fa- 
has e. Somewhat more empirical evidence 
ini from Worthington in his doctoral 

he sei (5) and in a recent article (3). 
Bar ormer offers evidence of a favorable 
eee a (approximately 877% agreement) 
ini PH analyses and psychiatric diag- 
tia. or ten V.A. Mental Hygiene Clinic pa- 
h s. The latter article indicates that PH 

as useful in predicting effectiveness of sales- 
ma for a light manufacturing company, as 
of ee by biserial correlation coefficients 

34 with tenure and .31 with sales volume. 
ae and Newton (4) reported that in the 
oe Iction of supervisory potentiality, the PH 

Accurate in 85% of the cases. 

ce a single PH analysis costs in the 
fare orhood of $40.00, its use would be 
3 i ed only if its efficiency were consider- 
ia y greater than that of conventional, less 

Pensive instruments. It is, therefore, the 
P urpone of this study to compare PH and ob- 
wren tests with respect to their relative effi- 
ind Cy in predicting associates’ ratings in an 

ustrial situation. 


Method 


ee = subjects of the present study were 47 
Ployees of an Iowa publishing company: 
ant Were originally selected by the employ- 
and Manager as having rather distinctive 
o T personalities lending themselves 
nd PH diagnosis and to simple inspec- 
Cols checks on the accuracy of the proto- 
xacti In order to make possible a more 
iis test of the potential value of the 
Prese: ment, the problem was subsequently 

ioa to the Department of Psychology 

owa State College as a possible thesis 


85 


project. The department approved it as 
such, and it was decided that the most con- 
venient procedure for evaluation would be to 
compare the validity of the PH, against a 
criterion of associates’ ratings, with the va- 
lidity of certain standardized tests, against 
the same criterion. 

In addition to PH analyses for each sub- 
ject, percentile ranks were available on speed 
and power measures of intelligence (The 
Wonderlic Personnel Test and The Person- 
nel Laboratory’s Employment Test), on the 
Thurstone Temperament Schedule, and on 
the traits “Dominance” and “Self-Sufficiency” 
from the Bernreuter Personality Inventory. 

A five step criterion rating scale was con- 
structed, the traits included being selected on 
the bases of ease of rating and commonality 
with both PH and test results, particularly 
the former. 

PH reports were transformed into quanti- 
tative terms by five experienced psycholo- 
gists. These judges decided, on the basis of 
PH reports, whether a given subject should 
be classified as “high” (+) or “low” CS) 
with respect to each of the traits under con- 
sideration, and the score assigned reflected 
the degree of their agreement. Literally, a 
six point scale was provided, ranging from 
five +’s to no +’s. It was unnecessary to 
perform this operation on the PH estimates 
of intelligence, since these were reported in 
terms of estimated Wechsler-Bellevue intelli- 
gence quotients. 

The criterion ratings were made by two 
Since the subjects were 


raters per subject. 
scattered throughout the company concerned, 
n more ratings per 


it was impossible to obtai 

individual or to have each rater rate all of 
the subjects; and the ratings were, therefore, 
simply pooled. 


Since the test scores were reported only in 


centile ranks and could not be assumed to be 
normally distributed, the correlation among 
the variables was estimated in terms of con- 


86 John G. Clark and W. A. Owens 


Table 1 


Agreement enti Judges on Quantification 
of PH Reports 


Degree of Degree of 
Traits Agreement* ` Agreement? 
Activity 68% 87% 
Vigorous 45% 14% 
Impulsive 45% 74% 
Dominant ` 42% 1% 
Stable 38% 59% 
Sociable 42% 84% 

. Reflective 19% 57% 
Self-sufficiency 38% 78% 
Job-effectiveness 70% 89% 
Promotion possibilities 42% 12% 
Adjustment to others 36% 74% 


* Per cent of cases in which all five judges agreed. 
b Per cent of cases in which four of the five judges 
agreed. 


tingency coefficients. The following com- 
parisons were made: Comparison I: PH vs. 
ratings; Comparison II: Test results vs. rat- 
ings; Comparison III: PH vs. test results; 
and Comparison IV: PH vs. ratings as com- 
pared with test results vs. ratings. 

The significance of the difference between 
the two: series of contingency coefficients in 
Comparison IV was tested through the use of 
a randomization test (1). 


Results 


The degree of agreement among the judges 
who quantified the PH reports is shown in 
Table 1. 


Table 2 shows the retest reliabilities of the 
ratings. 

Tables 3, 4, 5, and 6 show the results of 
the four comparisons. The only significant 
relationship found in the first three compari- 
sons was that between PH and the Thurstone 
Temperament Schedule on the trait “Soci- 
able.” This coefficient was significant at the 
-01 level of confidence. 

Comparison IV, the crucial one (Table 6) 
shows consistently higher coefficients of rela- 
tionship between objective test results and 
ratings than between PH and ratings. The 
randomization test indicates a probability of 


+ Test scores were grouped by deciles. 


Table 2 


Retest Reliabilities of Ratings 
Note: 5% level, r = 29; 1% level, r = .37. 


Trait mu 
Activity 68 
Decisiveness -46 
Dominance -63 j 
Personal Adjustment 90 
Sociability 02 
Job-effectiveness 19 
Promotion possibilities 93 
Adjustment to others 91 


A P i- 
.06 that five such differences in the same 4 
rection could occur by chance. 


Discussion 


The apparent, consistent superiority of Le 
objective tests, as indicated in this investiga 
tion, constitutes damaging evidence as tO in i 
usefulness of PH. These results, while es 
accord with those of the previously mention! i 
studies of the PH, do tend to follow the pa 
tern of dubious or negative results found J 
validational studies of other projective tec 
niques (2). the 

Although not of major importance to, 04 
present study, the lack of significant tae 
ship to the criterion on the part of both st” ined 
tured and unstructured techniques is OF | a 
terest. There are certain methodolosi 
problems involved which may, in parh 

rate 
the 


count for this lack of relationship. | 

are centered around: (a) the criterion 

ings; (b) the quantification of PH; (c) } 
selection of tests; and (d) the nature 2” 
number of subjects, 


Table 3 , 
Contingency Coefficients (C) PH ys. Rating 
Traits C 
Active -605 
Impulsive s655 
Dominant .654 
Stable 676 
Sociable 585 
Job-effectiveness 513 
Promotion Possibilities 697 
Adjustment to Others 614 


Validation Study of the Worthington Personal History Blank 87 


Table 4 
Contingency Coefficients (C) Test Results vs. Ratings 


Traits Č P 
Active aa SO 
Impulsive .749 >.05 
Dominant 740 >.05 
Stable 733 >.05 
Sociable 746 >.05 


ae main problem with the criterion has 
aan. been mentioned. Regardless of the 
nes ce of certain refinements, however, there 
on © reason to believe that the ratings were 
y. poorer criteria for PH than for the tests. 
beat method of quantification of PH may 
K gets of error. However, the selection 
the criterion traits was made on a basis 
which should not only be fair to PH but rea- 
ge important to the selection process. 
he general agreement among judges indi- 
a that the quantification of PH reports in 
terms of the selection of traits was adequately 
Fai Surely it would seem reason- 
le to assume that if a group of experienced 
Foy chologists encountered difficulty in mak- 
i. a useful interpretation of PH reports, per- 
ma workers would also have considerable 
= iculty in making a valid decision on the 
3 Sis of the information contained in them. 
ly, if the method of quantification 
€re to be suspected as a cause of error, one 
would not expect to find this factor operat- 


Table 5 
Contingency Coefficients (C) PH vs- Test Results 
Traits Cc P 
Intelligence 
Speed .688 >.05 
Power 157 >.05 
Active 685 >.05 
Vigorous 646 >.05 
» Impulsive 709 >.05 
Dominant! 673 >.05 
` Stable 694 >.05 
Sociable 187 <.01 
Reflective 698 >.05 
Self-sufficiency —.702 >.05 


1p, 
rom the Thurstone Temperament Schedule. 


Table 6 


Contingency Coefficients (C) PH vs. Ratings Compared 
with Test Results vs. Ratings 


Traits PH Rent i 

esults Differences 
Active -605 .753 .148 
Impulsive 655 -149 .094 
Dominant -654 -740 086 
Stable 676 733 037 
Sociable 585 -746 161 


ing in the case of the comparison between PH 
estimates of intelligence and the intelligence 
test results, since the PH estimates were given 
in terms of Wechsler-Bellevue intelligence 
quotients. Table 5 indicates no tendency 
for the relationship between PH and the test 
results to be higher in the case of intelligence 
than in that of other traits. In fact, the only 
significant relationship obtained appears on a 
trait which was quantified by the judges (so- 
ciability ). 

To continue, only test scores already avail- 
able could be utilized, and certain traits meas- 
ured were not well adapted to use in a rating 
scale. The effect, however, would only be to 
minimize the apparent effectiveness of. the ob- 
jective measures and to make the PH look 
relatively better. 

Finally, the subjects were neither a large 
nor. a random sample from the industrial 
population. Since they were originally se- 
lected because they possessed some outstand- 


ing traits or characteristics (an advantage © 


for the PH analyst if he were aware of it), 
they are by definition a very heterogeneous 
group. ‘This great variability among them 
has undoubtedly operated to inflate all the 
estimates of association herein reported. It 
thus seems that with comparable subjects 
and a larger N many of the criterion relation- 
ships of Tables III and IV would have been 
significant. It is difficult, however, to im- 
agine that the apparent merit of PH vs. ob- 
jective tests was largely influenced by the 


present choice of subjects. 


Summary 


An investigation was conducted to com- 
pare the Worthington Personal History Blank 


88 John G. Clark and W. A. Owens 


(PH) and objective test results with respect 
to their relative efficiency in predicting asso- 
ciates’ ratings of 47 employees of an Iowa 
publishing company. By and large, neither 
PH nor the objective test results were signifi- 
cantly related to the criterion ratings; how- 
ever, the coefficients of contingency for the 
comparison of objective tests with ratings 
were consistently higher than those obtained 
for the comparison of the PH with ratings. 
This difference was significant at the .06 level 
of confidence, 

It was concluded that, under the condi- 
tions of the present study, the efficacy of the 
objective tests employed was at least as great 
as that of the PH, and very probably greater. 
It would thus seem that the use of the more 


expensive PH is not warranted in terms of 
cost. 


Received May 8, 1953. 


References 


1. Moses, L. E. Non-parametric statistics for psy- 
chological research. Psychol. Bull., 1952, 49, 
133-136, 

2. Schofield, W. Research in clinical psychology: 
1951. J. clin. Psychol., 1952, 8, 255-261. } 

3. Spencer, G. J. and Worthington, R. E. Validity 
of a projective technique in predicting sales 
effectiveness. Personnel Psychol, 1952, % 
125-144. 

4. Swint, E. R. and Newton, R. A. The personal 
history,—a second report. Reprinted from J. 
Ind. Train. Jan.-Feb., 1952. i 

5. Worthington, R. E. Use of the personal histor} 
form as a clinical instrument. Unpublishe' 
Ph.D. dissertation. The University of Chi- 
cago, June, 1951, 


Tue JOURNAL or Appi 
Vol. 38, No. x hae LIED PsycHOLOGY 


A Factor-Analytic Study of Supervisory and Group Behavior * 


Robert C. Wilson, Wallace S. High, Helen P. Beem 


University of Southern California 


and 


Andrew L. Comrey 
University of California, Los Angeles 


A series of studies designed to isolate fac- 
tors related to organizational effectiveness has 
been in progress at the University of Southern 
California (1, 2, 7). The general approach 

‘as involved the selection of a number of 
Similar work units or organizations which 
Would be divided into “high,” “medium,” and 
low” groups with respect to criterion data 
of effectiveness. Questionnaires have been 
administered to individuals within the work 
Units and the data analyzed against the cri- 
terion to determine if the individuals in the 
More effective work units answer the ques- 
tions differently from those in the less effec- 
tive work units. 
_ During the course of this work, the ques- 
tionnaires employed have gone through sev- 
eral stages of development. The current 
form involves the use of groups of homo- 
geneous items or “dimensions” developed for 
the purpose of assessing characteristics of or- 
8anizations hypothesized to have some rela- 
tionship to their effective operation. 

he dimensions have been revised from 
One study to the next, usually in the direc- 
tion of increasing item homogeneity. This 
Process has been carried out on the basis of 
'tem analysis information. Intercorrelations 
among dimension total scores, however, have 
made it clear that the dimensions overlap 
Considerably, despite differences in the ap- 
Parent content of the items making up those 
mensions, 

It was decided that a factor-analytic study 
re the principal dimensions would provide 
Urther information about the relationships 

s This resea i nder Contract 
QGjONR-23815 between the University of Southern 

i ia and the Office of Naval Research. The 


ns expressed are our own and are not neces- 


Pinio 

proj, shared by the Office of Naval Research. The 

foriect is directed by J. M. Pfifiner, with J. P. Guil- 

vest, d H. J. Locke as associate responsible in- 
Stigators, 


89 


among them and provide a basis for further 
revision and culling to yield a more economi- 
cal coverage of the domain. Further, it was 
believed that such a study would suggest 
areas which might be explored more fully by 
developing new dimensions. 


Procedure 


The Sample. The questionnaire was adminis- 
tered to 98 civilian journeymen at the Long 
Beach Naval Shipyard. The journeymen are 
skilled tradesmen who work on all phases of ship 
overhaul, repair, and construction for the U. S. 
Navy. Biographical data revealed that medians 
for the following variables were: (1) age, 40.2; 
(2) highest school grade reached, 11.6; (3) 
months worked for present supervisor, 12.6; (4) 
years in shipyard, 5.0; and (5) years in civil 
service, 5.8. 

The Questionnaires. Thirteen dimensions, or 
homogeneous groups of multiple-choice items, 
were factor-analyzed. The measuring instrument 
for each dimension contained six or eight items 
put in the following objective-item form: 


“ 


If some worker gets too “eager,” employees 
put pressure on him to make him quit working 
so hard: 1. always; 2. usually; 3. sometimes; 4. 


rarely; 5. never. 


In this item, as with all others, the five re- 
sponse categories were arranged on a continuum 
from a response which expressed infrequency or 
very little of the variable in question, in this case 
Lack of Informal Pressures to Restrict Produc- 
tion, to a response which expressed frequency or 
a great deal of the variable in question. The re- 
sponses were arbitrarily weighted from one to 
five, according to the number preceding the re- 
sponse. me : 

To facilitate computation of reliability esti- 
mates, the six or eight items for each dimension 
were separated into comparable halves or sub- 
dimensions of 3 or 4 items. This also supplied 
more variables for the factor analysis.? A total 


2If each dimension does represent a separate fac- 
tor, an opportunity has thus been provided for the 
factor to come out. If doublet factors appear for 
these pairs of sub-dimensions, they can be regarded 
as specific factors for this analysis. 


90 


score for each sub-dimension was obtained by 
adding the scores assigned to the responses of an 
individual for the items in that particular sub- 
dimension. i 

The Variables. The sub-dimensions included 
in the analysis were composed of items designed 
to reveal two kinds of information: (1) perceived 
characteristics of the respondent’s supervisor in 
his relations with employees; and (2) attitudes 
and interactions among members of the respond- 
ent’s work group., A sample item has already 
been given for one pair of sub-dimensions, 11 
and 12, Lack of Informal Pressures to Restrict 
Production. For the other sub-dimension pairs, 
sample items are given below. When reference 
is made to a third person, e.g., “he” or “him,” 
the person referred to is the respondent’s super- 
visor. (1 and 2) Pride in Work Group: You are 
proud of the work record of your unit: not at all 

- very much; (3 and 4) Absence of Dissen- 
sion: There are people in your unit who refuse 
to speak to each other: several ... none; (5 
and 6) Friendly Group Atmosphere: There is 
friendly kidding between people in your unit: 
not much... very much; (7 and 8) Group 
Cohesion: People in your unit act as a group to 
get things they want: never .. . frequently; (9 
and 10) Intensity of Informal Control: There 
are certain workers in your unit, besides the boss, 
who seem to lead the others: not at all. . . very 
much; (11 and 12) see above; (13 and 14) Par- 
ticipation: He is willing to listen to your ideas: 
never . . . always; (15 and 16) Lack of Arbi- 
trariness: He hates to have employees disagree 
with him: always... never; (17) Non-Appre- 
hension of Authority: ® The employees seem to 
be afraid of him: very much... not at all; 
(18 and 19) Being Informed: He passes on in- 
teresting bits of information he gets from the 
„front office: never .. . very frequently; (20 and 
21) Feedback :, He lets you know how you are 
doing: almost never... very frequently; (22 
and 23) Attitude Toward Safety Enforcement: 
He tries to see that safety rules are observed: 
almost never .. . very frequently; (24 and 25) 
Social Nearness: He has close friends among his 
employees: none . . . four or more. 

In the above items, the Tesponse given a weight 
of “1” on the dimension appears first, with the 
response given a weight of “5” following. In- 
termediate responses have been omitted to save 
space. 

Four of the nine extracted factors gave some 
appearance of generality as regards the dimen- 
sions included in this analysis. Three of the 
factors were largely specific to a dimension pair 
and two were residual factors. 


2 One of the Non-Apprehension of Authority sub: 
dimensions was not included in the factor analysis 
because of lack of item homogeneity. 


Wilson, High, Beem, and Comrey 


‘Interpretation of the Factors 


The factors are presented in the approx 
mate order of their clarity. Interpretations 
rest upon those variables with loadings of 40 
and above. The factor loadings of dimen- 
sions defining the factors are given in Table 1- 

Factor I. Supervisor-Subordinate Rapport: 
The dimensions with significant loadings 0” 
this factor seem to reflect the extent to whic 
a consultative, communicative type of rela- 
tionship exists between the supervisor ant 
his subordinates. The items on the Partici 
pation dimension are principally concerne! 
with the supervisor’s receptiveness to the 
ideas and opinions of his subordinates. The 
Lack of Arbitrariness dimension was intende 
to measure the degree to which the supervis 
is not dogmatic and arbitrary regarding his 
orders and decisions. Non-Apprehension i 
Authority reflects the extent to which su?” 
ordinates are not afraid of their supervisor 
Feedback concerns the extent to which t 
supervisor informs his subordinates as 
what he expects of them and lets them kn? 4 
how well they are meeting his expectations: 
Being Informed concerns the extent to WPIC 


t 
the supervisor tells his subordinates abOW i 


things that are going on in the organization 

which may be of interest to them. ed 
This factor seems to reflect the so-call 

“human relations” approach to supervision 


- 


s: . : ; r 
The composition of this factor is quite si™), f 


lar to that of a factor called “Consideration 
reported by Fleishman (3) and Gekoski ( b- 
Gekoski attributes the factor to an unplr” 
lished factor analysis by Shartle and Hemb 
hill (5). The “Consideration” factor 1S: 
scribed as including “behavior which is a 
dicative of friendship, mutual trust, resp? f 
and a certain warmth between the supetY® 
and group” (4). ef 
An alternative interpretation of the S" it 
visor-Subordinate Rapport factor is tha p 
represents halo effect. Since all the di™ e 
sions appearing on the factor involve gu p 
ments about the supervisor, there iS 
doubtedly some tendency for the respo” 
to råte a supervisor uniformly high of «ofS 
On the other hand, substantial correlati pe 
among these rating variables might w ip 
expected quite apart from: any halo effects 


dent 


-A 


eS 
2 3e 


Study of Supervisory and Group Behavior 91 


that the person who utilizes participation in 
hare is less likely to be arbitrary, prob- 
a tends to keep his subordinates informed, 
anes on.. Perhaps the most likely inter- 
the atid this factor is that it represents 
5 ‘ombined effects of a common factor of 
upervisor-Subordinate Rapport and halo. 
ae II. Congenial Work Group. Fac- 
ib represents the degree to which there 
PATE of discord and the presence of 
iG y interaction among individuals and 
The P of individuals within the work unit. 
a on Absence of Dissension are con- 
dich ips the lack of negative interactions, 
id i riction between workers, bad feel- 
hile efusal to speak to one another, etc., 
Peden of the items on Friendly Group 
as here emphasize the positive aspect of 
ance relations, ie., friendly kidding, good 
‘nat 8, and high morale in the unit. The 
ed of items on Lack of Informal Pres- 
ch s to Restrict Production concerns the ab- 
tore of animosities toward workers who are 
E re productive than the others in the group. 
ean orma] work standards have been gener- 
S the group, and social approval is 
a d from those who do not conform to 
x norms, some antagonism or dissension 
ay be present. 
moco III. Informal Control. This fac- 
woe the extent to which certain in- 
pe al outside the official chain of com- 
exercise influence over the work group, 


r, more generally, the degree to which the 


RP contains informal status hierarchies. 
mal Pen the dimension Intensity of Infor- 
ee contains items regarding several 
upon s of the influence of fellow employees 
appe the individual, the major component 
4 foe to be reflected in such items as: 
sid ere are certain workers in your unit be- 
a Hh e boss who seem to lead the others.” 

RES informal or indigenous leadership in 
Ment one is a common phenomenon, docu- 
oadin by much research in this area. The 
Bests fe of Participation on this factor sug- 
More ac this type of informal leadership 1S 
Visor Š ely to occur in a group whose super- 
Sma ee as receptive to the ideas and 
the į Ae of employees about the work. If 
“formal leadership is operating in the 


work situation, rather than simply in social 
situations, its emergence in such a situation 
may be due either to the supervisor’s delega- 
tion of authority or to his lack of vigorous 
leadership. ie 
Factor IV. Group Unity. Factor IV ap- 
pears to represent the degree to which the 
group works together for a common purpose 
and the extent to which the group is ready to 
take action as a unit either on behalf of one 
of its members or on behalf of the group it- 
self. Group Cohesion and Friendly Group 
Atmosphere had substantial loadings on this 
factor for both dimension halves, with the 
former dimension contributing more heavily. 
The presence of the latter dimension on this 
factor suggests that a congenial interaction 
among group members is a necessary condi- 
tion for a closely-knit work group. The pres- 
ence of Lack of Arbitrariness possibly indi- 
cates that non-dogmatic behavior on the su- 
pervisor’s part may be conducive to friendly 
informal organization among the workers. 
Factors V, VI, and VII. These factors 
emerged as doublet factors defined by the 
three pairs of variables, Attitude Toward 
Safety Enforcement A and B, Social Near- 
ness A and B, and Pride in Work Group A 
and B. The fact that they emerged as: sepa- 
rate factors is evidence that they are measur- 
ing something different from the other vari- 
ables in the analysis. They will serve the 
purpose of indicating areas in which we need 
to construct additional dimensions. In fur- 
ther analyses more general factors may occur 


in these areas. 


Summary 


Questionnaires were given to 98 skilled 
tradesmen at a naval shipyard. Items in the 
nnaires were grouped to measure: (1): 
ory practices in relations with em- 
ployees such as Participation, Lack of Arbi- 
trariness, Being Informed, Feedback, Atti- 
tude Toward Safety Enforcement, and Social 
Nedrness; and (2) attitudes and interactions 
of the members of the work group, such as 
Pride in Work Group, Absence of Dissension, 
Friendly Group Atmosphere, Group Cohesion, 
Intensity of Informal Control, Lack of In- 
formal Pressures to Restrict Production, and 
Non-Apprehension of Authority. Each of 


questio: 
supervis 


92 Wilson, High, Beem, and Comrey 


these item groups, or dimensions, was divided 
into two item pools, each pool containing 
three or four items. Twenty-five of these 
questionnaire variables were intercorrelated 
and factor analyzed by the centroid method. 
Rotation of the centroid axes to meaningful 
positions was carried out.* 

The variables calling for evaluation of su- 
pervisory practices all emerged on a factor 
called Supervisor-Subordinate Rapport which 
appeared to reflect a consultative, communi- 
cative type of relationship between the super- 
visor and his subordinates. 

Three other important factors were related 
to relationships among the workers them- 
selves, Congenial Work Group, Informal 
Leadership, and Group Unity. The first of 
these was concerned primarily with the de- 
gree to which there is informal leadership in 
the group, and the third reflected the extent 


*To reduce printing costs, the tables of intercor- 
relations, complete rotated and unrotated factor load- 
ings and the complete set of items (13 pages) have 
been deposited with the ADI Auxiliary Publications 
Project. Order Document No. 4116 from Chief, 
Photoduplication Service, ADI Auxiliary Publica- 
tions Project, Library of Congress, Washington 25, 
D. C., remitting $1.75 for microfilm (images 1 inch 
high on standard 35 mm. motion picture film) or 
$2.50 for photocopies (6 X 8 inches) readable with- 
out optical aid. Advance payment is required. 
Make checks or money orders payable to: Chief, 
Photoduplication Service, Library of Congress. 


to which the group is unified and is ready to 
take action as a unit to get something for the 
group or for one of its members. ‘The re- 
maining identifiable factors were doublets, de- 
fined by the paired sub-dimensions, Attitude 
Toward Safety Enforcement, Social Near- 
ness, and Pride in Work Group. 


Received June 8, 1953. 


References 


1. Comrey, A. L, Pfiffner, J. M., and Beem, Helen 
P. Factors influencing organizational effec- 
tiveness. I. The U. S. Forest survey. Per- 
sonnel Psychol., 1952, 5, 307-328. 

. Comrey, A. L., Pfiffner, J. M., and Beem, Hele? 
P. Factors influencing organizational eft 
tiveness. II. The Department of Employme? 
survey. Personnel Psychol., 1953, 6, 65-79. 

3. Fleishman, E. A. The description of supervisor 

behavior. J. appl. Psychol., 1953, 37, 1-6- 
4. Gekoski, N. Predicting group productivity. Pete 
sonnel Psychol., 1952, 5, 281-292. h 
5. Report Number 6, RF Project 403. Researe 
Foundation, The Ohio State University, ie 

6. Thurstone, L, L. Multiple factor analysis. oni 
cago: The University of Chicago Press, 194l» 

- Wilson, R. C., Beem, Helen P., and Comrey, Aue 
Factors influencing organizational effectiveness: 


Ro 


II. A survey of skilled tradesmen. Person 
nel Psychol., 1953, 6, 313-326. d 
8. Zimmerman, W. S. A simple graphical metho? 
for orthogonal rotation of axes. Psych? 


metrika, 1946, 11, 51-55, 


Tue Journ, 
Vol. 38, Nar oF poe PsycHoLocy 


The Check List as a Criterion of Proficiency * 


Arthur I. Siegel 


Institute for Research in Human Relations, Philadelphia 


mee cn represents a particularly at- 
Perform i A measuring a man’s ability to 
prepared b ask. A performance check list is 
Ponent y analyzing a task into the com- 
order to actions which a man performs in 
task eee the task. In some cases the 
an end Sa making something. In this case 
is also oe: S sar evolves and the end product 
certain “ zed in terms of its adherence to 
from ce cribed standards and its freedom 
each of i . An examinee receives credit for 
with whi E analytic performance components 
mentalisti he conforms and each of the ele- 
meets ic aspects of his end product that 
Score aten standards. A total task 
each of ara ved by adding a man’s credits on 
ance and k feed components of perform- 
standards. nd product adherence to prescribed 
He Instance, a performance check list for 
the ae e of items relating to the way 
ase A performs his job (eg., “cleans 
regulato al and rods,” “adjusts oxyacetylene 
Metal.» rs to 4-5 pounds,” “preheats base 
Welder Pca the safety precautions the 
ene vik lows (e.g., “does not open acety- 
ys o inder valve more than 1% turns,” 
n a when welding,” “makes sure fire 
~ her in area before igniting torch”) ; 
ns relating to adherence of the final 
Width n prescribed standards (€g. “bead 
eight 2 times metal thickness,” “bead 
The wel ee of metal thickness,” etc-)- 
items with is given credit for each of the 
Weld Fn which his performance oF final 
these orms, and his total score is the sum 
Tona, The problem, as mentioned 
job a is that there may be aspects of 
© that so are lost in the analytic approach, 
uti oring the elementary items does not 
the dats tion of 
Ween th eee under Contract sonal Pao) _be- 
Brea e Office Re for Research in Human Relations 
Sarid are those eval Research. The opinions ex- 
Rese, represent e the author and do not nel 
2setch or of ihe onions of the Office o a 
John windike, R. aval Service. : 
iley, 1949. vsonnel selection, 


a 


data herein reported are 


give an entirely adequate evaluation. Spe- 
cifically, the scores obtained by subjects 
scored in an analytic manner may not cor- 
relate highly with the over-all, “clinical” 
judgments of experts as to the quality of a 
final product produced. i 

If performance check list scores do cor- 
relate highly with expert, “clinical” judgments 
or rankings of end products, the performance 
“check list” is to be preferred. This follows 
since more objectivity may be introduced 
into the check list, examiner reliability may 
be increased by the check list, test reliability 
may be increased by the check list and less 
background and less experience in the par- 
ticular test task is required by the examiner 
who uses the check list than by the examiner 
who makes a “clinical” appraisal. Moreover, 
if performance in process is checked as well 
as the quality of the final product, certain in- 
sights may be gained which could be missed 
by only an over-all, final appraisal of end 


products. 

Method 

ck lists? were constructed 
aluminum butt welding; 
plastic; splicing a cracked 
and aircraft fabric repair. 
er reliabilities for both the 
g and fabrics lists were 
The inter-examiner re- 
liabilities for the plastics and channel splic- 
ing lists were not ascertained. However, the 
inter-examiner reliabilities on four other check 
lists similar to the ones herein discussed 
ranged from .91 to 97. Likewise, the intra- 
examiner reliability by retest methods for 
measurements made on the adherence of end 
products to prescribed standards was ascer- 
tained only for the welding and fabrics lists. 
inting costs, the performance check 
he correlational matrices upon which 
based have been deposited with the 
Publications Project. Order Docu- 
ment No. 4038 from Chief, Photoduplication Serv- 
ice, ADI Auxiliary Publications Project, Library of 


Congress, Washington, D. C., remitting $1.75 for 35 
mm. microfilm or $2.50 for 6 by 8 inch photocopies. 


Performance che! 
for four tasks: 
patching a hole in 
aircraft channel; 
The inter-examin 
aluminum butt weldin 
coincidently at .92. 


3 To save pr 
lists as well as t 
our discussion is 
ADI Auxiliary 


94 Arthur I. Siegel 


These intra-examiner reliabilities were .93 and 
87 respectively. Intra-examiner reliability 
for observations made of performance in 
process is difficult to obtain by the retest 
method. This difficulty follows because of 
the relative impossibility of having an ex- 
* aminee perform the same task in exactly the 
same manner on two separate occasions. 
However, by a motion picture technique, the 
intra-examiner reliability for performance in 
process was found to be .93 for the welding 
test. The intra-examiner reliabilities for per- 
formance in process of the remaining lists 
were not determined. 

The aluminum butt welding, plastic patch- 
ing, channel splicing, and fabric repair tasks 
were first administered to 15 aviation struc- 
tural mechanics at the Naval Air Technical 
Training Unit, Memphis. Four of the sub- 
jects held the Naval rate of “striker,” four 
held the rate of third class, four the rate of 
second class and three the rate of first class 
Aviation Structural Mechanic. All of the jobs 
represented in these tests are tasks which 
Naval Aviation Structural Mechanics typi- 
cally perform. The mean service length of 
the examinees was 54.6 months with a stand- 
ard deviation’ of 38.3 months. The mean 
scores and standard deviations for the group 
on each of the tests follow: aluminum butt 
welding, mean 15.0, sigma 6.9; splicing a 
cracked channel, mean 44.3, sigma 6.8; 
plastic repair, mean 20.8, sigma 6.5; and 
fabric repair, mean 33.4, sigma 9.0. The ex- 
aminers were Chief Aviation Structural Me- 
chanics who held instructors’ billets at the 
Naval Aviation Structural Mechanics School, 
Memphis. The examinees were unknown to 
the examiners prior to the testing situation. 

The end products produced by each ex- 
aminee on each of these tests were taken to 
the Naval Air Station, Atlantic City. At At- 
lantic City, five Chief Aviation Structural 
Mechanics were asked to rank, from best to 
worst, the end products produced by the ex- 
aminees on each of the tests. 


Results 


The. correlations of the rankings of the 
Chief Petty Officers who ranked the end 
products from each test with the rankings 
produced by our analytic and synthetic ap- 


Table 1 $ 
ee 
Median Correlations Between Chiefs and Bet 
Rankings of Chiefs with Analytic Approac 
pe 
5 jan yh 
Medien Meiel t 
Between A nal ch 
Test Chiefs ADE 
$ z 1 
Welding. 95 Mi 
Plastics 89 À 
Structural | ' 20 
Maintenance 37 ‘ “33 
Fabrics wor E 


proach and also the correlations between the 
rankings of the various chiefs who ranked 
the end products at Atlantic City were calcu- 
lated. All of these correlations were rank 
difference correlations. If the experts (Naval 
Chief Aviation Structural Mechanics) agreed’ 
among themselves more than they agreed wit 
the rankings produced by the analytic and 
synthetic approach, then the analytic a? 
synthetic approach has lost something that 
these experts considered to be important ™ 
making their rankings. On the other hand, 
if the experts agreed with the rankings P!” f 
duced by the check list as much as they 
agreed among themselves, then little has peen 
lost by the analytic and synthetic approa® 
The median rank difference correlation E 
tween the rankings of the chiefs at Atlanti 
City of the end products from each test E 
then obtained. The median rank different 
correlation of the chiefs’ rankings with e 
rankings produced by our analytic and Sy d j 
thetic check list approach was also calculate 
These ros are presented in Table 1. dit 
For the welding test, the median rank ‘ 4 
ference correlation of the chiefs’ ranking’ 
with the rankings produced by the analya 
and synthetic approach was .41. For Ai 
plastics test, the structural maintenance E 
and the fabrics test, the median rank di 
ence correlations of the chiefs’ rankings by 
the rankings produced by the analytic E d 
synthetic approach were .66, .26-and 331g 
spectively. For the welding test, the plas el 
test and the structural maintenance test, 7 
median rank difference correlations be 9) 
the chiefs’ rankings were .95, .89, 40° ‘tof 
respectively, and these three rios were °° 


Chech List as Criterion of Proficiency ` 95 


ae e correlation of the chiefs’ rankings 
and ‘eh eo produced by the analytic 
aA T etic approach. However, the re- 
S bet as true for the fabrics test; median 
with ae chiefs .29, median rko of chiefs 
differene: ytic approach .33. All median rank 
to pro a correlations were -then converted 
v one, ct moment correlations and the prod- 
leto hee correlations transformed to z's 
the diff ransformation). The significance of 
ae [aw was then calculated between 
Ra he = represented the median correla- 
dud ween the chiefs’ rankings of the end 
ited of each test and the z's which rep- 
rankings median correlations of the chiefs’ 
analyte with the rankings produced by the 
fase and synthetic approach for the same 
TA a a tests of significance calculated, 

ifference between the median cor- 


relati 
Telation between the rankings of the chiefs 


an à 
ie median correlation of the chiefs’ 
ings with the rankings produced by the 


| analyt 
analytic and synthetic approach for the weld- 


i 
at Was statistically significant. How- 
Ane this juncture, it is well to point out 
correlation of the four median rank difference 
greater AE between the chiefs’ rankings were 
chiefs? y an the median correlations of the 
Y th ankings with the rankings produced 
. ¢ analytic and synthetic approach. 

E it seems that we were not able to 
Hears tate conclusively that the Chief Petty 

i in our sample agreed with the rank- 


an 
8S produced by the analytic and synthetic 


appr 
othe: ach more than they agreed with each 
(here n the other hand, in view of the bias 
e Ee olled) that usually enters into 
àre el end products, when judgments 
Probab] e in actual work situations, it seems 
lower ee ed loss indicated by the three 
the ae, elations of the chiefs’ rankings with 
Syntheti ings produced by the analytic and 
2 Barc approach as compared with the cor- 
e ae between the chiefs’ rankings would 

Pine CL dori this mas had been al- 
able Operate here as a confounding vatl- 
Were sok Oreover, some procedural elements 
Made th .2pparent to the chiefs when they 
or ae. judgments of the final products. 

66k De a man may break safety rules 

a certain amount of credit. Infrac- 


tions such as these are not seen when mak- 
ing appraisals of end products, but may be 
too important to omit from consideration 
when estimating a man’s real work ability. 
Since the chiefs who did the ranking had only 
end products to evaluate, it seems probable 
that part of the loss indicated by the lower 
correlations between the chiefs’ rankings and 
the rankings produced by the analytic and 
synthetic approach as compared with the be- 
tween-chiefs’ correlations may also be as- 
signable to distortions in the rankings of the 


analytic and synthetic approach due to poor ` 


care and use of equipment, violation of safety 
precautions, etc. on the part of the examinees. 


Summary 


As mentioned by Thorndike, there may be 
aspects of a job that are lost in breaking 
down a job in terms of the elemental, analytic 
components comprising the job. If this is so, 
then scoring the elemental, analytic compo- 
nents of a task does not give an entirely ade- 
quate evaluation. Therefore, an investiga- 
tion was performed into the relationship be- 
tween total scores assigned via a scoring of 
the elemental components of a job (perform- 
ance check list approach) and over-all, ‘“‘clini- 
cal” judgments of experts (Naval Chief Petty” 
Officers) as to the quality of a final product 
produced. Rank difference correlations were 
calculated between the scores assigned via a 
performance check list and judgments of ex- 
perts as to the q 
ter-correlations bi 
experts on the quali 
duced were also calculated. 


conclusions seem warranted. 
1. Three out of four median correlations 


between the rankings of Naval Chief Aviation 
Structural Mechanics were not significantly 
different from correlations of the chiefs’ rank- 
ings with scores obtained by an analytic and 


synthetic approach. 
2. Although no statistical differences were 


shown in three of the four pairs of rank dif- 
ference correlations under consideration, the 
tendency was toward greater agreement be- 
tween experts’ rankings than between the 
rankings of the experts and the rankings of 
the analytic and synthetic approach. 


etween the rankings of the 
ty of end products pro- 
The following 


Received May 26, 1953. 


uality of final products. In- » 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 2, 1954 


poa . . 1 
Identification and Prediction of Two Training Criterion Factors 


Warren R. Graham 
New York, N. Y. | 


The problem of identifying and predicting 
the variables which are involved in the suc- 
cessful completion `of a comprehensive train- 
ing program was attacked in this study. 
Twelve examination scores from six courses 
given as part of the Naval Pre-Flight train- 
ing program were defined as the training cri- 
terion. 

The nature of the criterion was studied by 
the Thurstone centroid factor method. The 
pre-flight examinations criterion was hypothe- 
sized to be composed of two or more factors 
which could be identified and separately pre- 
dicted. An attempt was made to predict the 
factors by an adaptation of the Doolittle 
multiple correlation method. 


Procedure 


In the parent study (3) a battery of stand- 
ard tests was administered to entering stu- 
dents, and scores from these predictors were 
correlated with subsequent classroom and 
final examination grades (criterion variables). 
The present study employs the ten predictors 
which had produced regression coefficients 
indicating that they were the best predictors 
of the criterion variables. 


The sample studied consisted of 399 students 
in four classes who were homogeneous in having 
met the following requirements: (1) minimum 
physical standards; (2) minimum standards on 
a Naval Aviation selection battery; (3) comple- 
tion of two years of college; and (4) between 
19 and 25 years of age. 

The predictor and criterion 


variables employed 
were as follows: 


A. The Predictor Variables: 


2. Language Skills Test, G. E. D., college 
level. 

6. Minnesota Clerical—Name checking, 
(1946). 

7. Mathematics—Diagnostic, (Ability t 
perform simple computations), 


‘The author is indebted to Dr, Harold A. Edger- 
ton for guidance and suggestions offered during 
analysis of the data. The materials treated herein 
were collected by Richardson, Bellows, Henry and 
Co., Inc., under Office of Naval Research Contract 
N 7 onr-383, TO-I, with the cooperation of the Spe- 
cial Devices Center and the Pensacola Naval Air 
Training Command. Interpretations and opinions 
are the author’s and should not be construed as 
having U, S. Navy endorsement. 


o 


8. Mathematics—Reasoning, (No high 
school algebra or geometry required a 

9. Mathematics—Total, (Combined scot 
for tests 7 and 8). i POE 

10. American Council on Education Abil 
chological Exam., 1947, Q Score, ( theri 
ity to handle quantitative and 0 i 
nonverbal problems). i 

11. Above, L Score, (Linguistic or vet 
abilities). 

12. Above—Total, (Combined score 
tests 10 and 11—general academic 
ity). 

13. Aviation Classification Test, 
academic ability). aty. 

14. Mechanical a Test, (Ability -| 
to solve mechanical problems). 


B. The Criterion Variables: 
16. (Daily),? 17. (Exam.),* Aerology, 
teorology), sn vines, (Parts 
23. (Daily), 24. (Exam.), Engines, eights 
functions, instrument checks, W ; 
0 


bal | 


for 
abil- 


al t 
(Gener 
¥ 


(Me- 


and balances, etc.). jals 
26. (Daily), 27. (Exam.), Essent ilitary 
Naval Service, (Regulations, ™ 
courtesy, leadership, etc.). , les of 
30. (Daily), 31. (Exam.), Principles ift, 
Flight, (Aircraft nomenclature, | : 
drag, stability, compressibility, Cation 
33. (Daily), 34. (Exam.), Nowra i 
Dead Reckoning, (Earth’s coor faxes: 
variation and deviation, winds, 
etc.). iam Cll 
35. (Daily), 36. (Exam.), Navigatior gle: 
lestial, (Astronomy, celestial '* io { 
latitude by polarities, sextant. ‘g c) | 
compass, time and radio signals." gl 
39. Navy Grade, (a combined score 
criterion variables). 


sof! 

Statistical Technique, The twelve crite fuo 
examinations were factored, yielding Hags ch” 
tors, of which two could be rotated to Pgo: 
logically meaningful] patterns of factor 10% ot 
using Thurstone’s centroid method and 8 n 
rotation. 5 rv 
The intercorrsjations of the ten predict’ gate, 
ables which hag been shown to be most a pe 


to the twelve criterion examinations in crite" 


ent study (3 dded to the 
K 7,4) were then a Ide the on 
peronea matrix. This produced y TSh 
ted prercorrelation matrix show rite, 


i 
factors in ¢ 


e 
ictors were related to the fed Pe 
dic btain ro a e ae 
dictor fact(uch a way as to obla ing te 
ictor act loadings without chang sett 
— : i 


jn: 
2 Daily G, s from 


tor-prepare¢ des. Averages of grade i g 
3 Exam. | Weekly quizzes. pwwo-hott 
apps „Grades. Grades on a ard- 


exam. prep: ning bo? 


“red by a central exam! 


98 Warren R. Graham 


sults obtained for the separate analysis of the 
criterion examinations.* : 

When the factor loadings for the predictors 
had been obtained, they were considered to be 
validity coefficients between the predictors and 
the criterion factors. The Doolittle multiple 
correlation method was then employed to get a 
prediction of each of the two criteria obtained 
(the criterion factors) by each predictor in terms 
of standard partial regression coefficients (beta 
weights). The regression coefficients represent 
the degree to which a predictor will predict each 
criterion when the influence of the other predic- 
tors has been partialled out or controlled. 

Thus, instead of predicting each of the twelve 
criterion variables or a composite of them, two 
factors which economically and accurately repre- 
sent them have been predicted. + 

The composite variables (Math-Total, Navy 
Grade and American Council on Education—To- 
tal) for which factor loadings were computed, 
were not included in the Doolittle solution to 
avoid contamination due to spurious correlations 
with the tests which compose them. 

No measures of the reliability of the criterion 
variables are available, ' 


Results 


Tables 2 and 3 present the factors which 
resulted from analysis of the criterion ex- 
aminations. Two of the three factors are 
Psychologically meaningful when rotated to 
simple structure. These are: 

Factor I. Navigation. Table 3 indicates 
that the highest factor loadings occur for the 
four navigation criterion examinations (vari- 
ables 33, 34, 35, 36). As shown in Table 4, 
this factor was best predicted by the Minne- 
Sota Clerical Test-Name Checking, var. 6; 
the Mathematics Diagnostic Test, var 7: 
and the American Council on Education 
Test-Q Score, var. 10. Tt was negatively 
predicted by the ACE Test-L Score, var. 11, 
and the Mechanical Classification Test, var. 
14. 

In general, the Navigation factor is best 
predicted by the tests of arithmetic and cleri- 
cal (name-checking) abilities, and negatively 
predicted by the tests of linguistic abilities. 

Factor II. Verbal Reasoning. Table 3 
shows high loadings for all variables except 
those of an obviously non-problem solving 
nature. This indicates that general ability 
is of importance to success in courses in the 
Pre-Flight curriculum. Table 4 indicates 
that this factor is best predicted by the 

4 See t i i i 
doi me toon, Abend below for a descrip- 


Table 2 
Centroid Factor Matrix (before rotation) 
Predictors I IL m, 

2. Lang. Skills, GED 287 074 —%6 

6. Minn. Clerical Test, Names 257 —144 —080 

7. Math. Diagnostic 346 —280 007 

8. Math. Reasoning 542 —105 —073 

9. Math. Total 51 —220 —228 
10. Am. Counc. Ed.—Q Score 318 —191 -u 
11. Am. Counc. Ed.—L Score 311 240 z 
12. Am. Counc. Ed.—Total 362 057 =le 
13. Aviation Class. Test 441 023 E 
14. Mech. Class. Test 361 104 28 

Criterion 

16. Aerology Daily 673 135 E y 
17. Aerology Exam. 648 130 o 
23. Engines Daily 701 159 101 
24. Engines Exam. 652 161 3 
26. Ess. Naval Serv. Daily 597 145 Za f 
27. Ess. Naval Serv. Exam. 444 158 ~ 306 
30. Prin. Flight Daily 636 306 125 
31. Prin. Flight Exam, 590 292 156 
33. Nav. Dead Rec. Daily 744 —317 148 
34. Nav. Dead Rec. Exam. 462 —435 ~ 420 
35. Nav. Celest. Daily 630 —372 093 
36. Nav. Celest. Exam. 543 —419 070 
39. Navy Grade 936 —090 = 
Mathematics Reasoning Test, var. 8, te 


Aviation Classification Test, var. 13, and m 
Mechanical Classification Test, var. 14 T 
ACE Test-L Score, var. 11, is a less sigh 
cant predictor than any of these three test 
but it is considerably more effective than t 
arithmetic and clerical (name-checking) on il 
In general, Factor II is best predicted by tes 
calling for reasoning ability. ei 
The Language Skills Test, GED-Collet 
Level, does not contribute to the predict! 
of either factor. Ar 
Factor III. Unidentifiable Variance. et 
though this variable is not clearly interp" y ; 
able, there is a possibility that it reflects * 
tively specific variance in’ the examinatld 
for Essentials of Naval Service, var. 20 
It might represent a personality or social er 
tor which has not been sufficiently dê 
mined to be identified. iot 
It is concluded that successful complet, 
of the Navy Pre-Flight curriculum is dep ene 
ent, in part, upon ability to perform sit 
arithmetic computations accurately, such ugh 
ity being required to pass courses in ney 
tion. That some clerical ability is re ot? 
is indicated by the fact’ that the Minne a 
Clerical Test-Name Checking, var. 6» 


Identification and Prediction of Criterion Factors 99 


Table 3 


Centroid Factor Matrix (after two rotations) 


Predictors I II p 

2. Lang. Skills, GED 5 295 

i s, 005 295 —05 
2 ee Clerical Test, Names 215 215 Zoss 
oath Diagnostic 325 , 280 O50 
ets . Reasoning 225 505 O20) 
ib aa Total 370 450 — 140 
ee a Ed.—Q Score 250 275 025 
in R3 coe Ed.—L Score —165 350 —O045 
ik te Ed.—Total 070 365 —175 
a ation Class. Test 080 440 —035 

ech. Class. Test —070 375 203 
: Criterion 

17, Aetology Daily 045 630 — 40 

A pctology Exam. —010 660 072 
A rs a Daily —040 719 ne 
H meee Exam. —045 675 ae 
F = es Serv. Daily 040 610 —245 
30, pes, Naval Serv. Exam. ood 470 gas 
3 Drin. Flight Daily —205 685 165 
a ion Flight Exam. =180' 0 4 
ge V. Dead Ree. Daily 415 665 259 
35, Nav. Dead Rec, Exam. oD go a 
3c, Nav. Celest. Daily 455 542 230 
39. Nav: Celest. Exam. 525 450 025 

avy Grade 295 85 - 


‘ae of the Navigation factor. Verbal 
ae E ated to solve verbal problems 
courses equisite to success in all academic 
ight (including navigation) in the Pre- 
curriculum. 


Technical Appendix 


A pre de matrix of the criterion variables 
Subjected tee (Table 1—Criterion Matrix) and 
one’s to multiple factor analysis by Thur- 
ra centroid method. 

factors emar’s (1) criterion 
residue yet PS extracted (that the SD of the 
e a should be down to or below t 
this ee rs) is difficult to interpre 
Þossib] ly, but- it suggests that three factors, an 
culty r more, should be extracted. The diffi- 
interpretation arises in determining how 


R Table 4 
egression Coefficients for the Predict 
Criterion Factors I and IL 


ion of 


2 I I 
e L; 
6. net age Skills, GED — 016 022 
i Math. Clerical Test, Names .234 .042 
8. Mag o eEnostic 263 “038 
10. Am oReasoning ` 036 297 
u, m, otrie, Ed.—Q Score .230 —.025 
B e: Ed L Score — 367 123 
14. 44g 0 Class. Test 075 205 
2 iid 207 


mete ae af 1 residuals must be before it 
n onsidered to meet the criterion of bei 

down to” the value for the zero-order r’s. The 
es used oe this study was to stop nt 
EA when the residuals closely approxi- 

The zero-order r’s for the predi v i 
ply added to the end of the ct ae 
matrix to form a rectangular matrix (Table D 
The predictor columns were summed and a 
vided by the square root of the sum of the a 
of the criterion columns for each factor thus 
placing the predictors in the same space as the 
criterion variables. The predictor residuals were 
computed and reflected according to the reflec- 
tions determined by the criterion variables. The 
Bese factor Joadings extracted (Table 2) 
were then rotated to the i i 
for the criterion factors. iagatieal planes ipe d 

The Doolittle method was used to obtain stand- 
ard partial regression coefficients to express the 
relationships between the predictors and the cri- 
terion factors. The factor loadings of each pre- 
dictor for each factor were placed in the cri- 
terion columns (i.e., two criterion columns [one 
for each factor] were used: instead of the usual 
one, and the predictor factor loadings were 
treated as validity coefficients). The basic equa- 
tions were solved for the regression coefficients 
to estimate the relationship of each predictor 
variable to each factor. 

The principal advantage to be derived from 
the initial factor analysis of the criterion vari- 
ables is that. the obtained criterion factors may 
be isolated free from the influence of the vari- 
ance of the predictor variables. The rotated fac- 
tor loadings of the predictors indicate which ones 
are most likely to be related to the criterion fac- 
tors. Thus, computation of regression coeffi-- 
cients for irrelevant predictor variables may be 
avoided. 
f the Doolittle multiple correlation 


table to permit the simultaneous computation of 
regression coefficients for the prediction of sev- 
eral criteria does not alter the results ordinarily 


obtained for any single criterion. The nature of 
t the prediction 


the computations t 
f criteria once the computations 


of any number 0: t 
for the zero-order inter-correlations of the pre- 
dictors are completed. 
Received June 5, 1953. 

References i 
1. McNemar, Q. On the number of factors. Psy- 


8. 


chometrica, 1942, 7, 9-1 
his, W. R. Statistical 


C. and Van Voor! 


2. Peters, C. 
procedures and their mathematical bases. 
New York: McGraw-Hill, 1940. 

3. Edgerton, H. A. A study of individual differences 
among naval aviation students. New York: 
Richardson, Bellows, Henry and Co., Inc., 
1949. ; 

4, Thurstone, L. L. Multiple factor analysis. Chi- 
cago: University of Chicago Press, 1947, 


THE JOURNAL or APPLIED PsycHotocy 
Vol. 38, No. 2, 1954 


Rater and Technique Contamination in Criterion Ratings 


Gloria H. Falk and A. G. Bayroff * 


Personnel Research Branch, The Adjutan 


Washington, D. C. 


An important consideration in validation 
studies is the degree to which the procedures 
for obtaining the criterion measures are in- 
dependent of the predictors. Should predic- 
tor test scores enter into the determination of 
criterion scores then the correlation between 
predictor scores and criterion scores will be 
artificially increased. Similarly, if the cri- 
terion measure is a rating, the validity will be 
inflated if the raters base their evaluations on 
prior knowledge of the predictor scores. 

When both the predictor and criterion 
measures are ratings, the problem of criterion 
contamination may be critical. It may take 
the form of rater contamination as, for ex- 
ample, when both the predictor raters and 
the criterion raters are the same persons. Or, 
criterion contamination may take the form 
of technique contamination. This form of 
contamination may exist when both the pre- 
dictor rating and the criterion rating employ 
the same rating technique. Thus, a graphic 
Predictor rating may correlate more highly 
with a graphic criterion rating than would a 
forced-choice predictor rating with a graphic 
criterion tating. Since the usual criterion 
rating employs some form of graphic rating, 
the Possibility exists that the graphic rating 
technique appears more valid than others pri- 
marily because both predictors and criteria 
employ the same measuring technique. 


Problem 


The subject of study described here was 
the comparative influences of two potential 


1 The opinions expressed in this article are those 
of the authors and do not necessarily express the 
official views of the Department of the Army. Ac- 
knowledgment is made of the participation of vari- 
ous staff members of the Personnel Research Branch, 
particularly Edward A. Rundquist and Helen R. 
Haggerty. Acknowledgment is also made of the gen- 
erous assistance of the Commandant, Command and 
General Staff College, Fort Leavenworth, Kansas, 
and his staff, and the officer students in attendance 


during the gathering of the field data basic to the 
study. 


t General's Office, Department of the Army, 


sources of criterion contaiination— simil 
ity of raters and similarity of techniques. i, 
attempt was made to estimate the relativ 
amounts of agreement between two sets ke 
ratings when the two sets involved: (a) t 

same raters; and (b) the same techniques: 


Method 


Population. The population consisted % 
400 officers (primarily majors and lieuten@! 
colonels) enrolled as students at the At™ 
Command and General Staff College. ja 
objective of this college is to train potents 
division commanders and general staff E 
cers, and its students represent a highly i 
lected group. The course was 42 weeks 10 3 
during which the students were in close aa 
tact with one another. Each officer serve 
both rater and ratee. PE (° 

Design of the Study. This study 1s a 
part of a larger research program on rati 
methodology. This report will be limite ‘ 
those aspects pertinent to the present stu 
The study permitted a comparison tO 
made of the amount of agreement peine 
ratings made with identical techniques „det 
those made with different techniques ae, 
two general conditions: (a) when the Fib) 
raters made the two sets of ratings; and 0 
when different raters made the two sets 
ratings. en” 

Instruments. The rating techniques int 
ployed in this study were an eight-P vice 
graphic scale of over-all value to the re 
and two versions of the forced-choice vt 
nique.* The graphic scale was provided be ij 
descriptions for each of its eight-points, o 
ing from “The most outstanding Offce ss 
know” as point 1, to “An officer who 
not have the calibre th 


„eas0 
at one should re# 


Th? 
2 Schneider 


relationship 
lidity of rati 
280. 


100 


, Dorothy E. and Bayroff, A. Si 
between ‘rater characteristics 37 
ngs. J. appl. Psychol, 1953, 3! 


t 

y 
The 
t 1 
ffi 
g 
n 
d as 
g 
d t0 
dy 
þe 
n 

d 

f 


Rater and Technique Contamination in Ratings 


ably expect in an officer? as point 8. To 
counteract the reluctance of most raters to 
use the low end of the scale, two of the points 
below mid-scale value were favorably defined, 
eg. “An acceptable officer whose value is 
limited in some respects.” The fact that the 
entire scale was used may, in part, be at- 
tributed to this device. ` 

Both versions of the forced-choice tech- 
nique had identical items. In one form, the 
Controlled check list (CCL), the items were 
arranged in two sets of 24 phrases each. The 
rater selected the 12 most descriptive phrases 
In each set. In the second form, the forced- 
choice pairs (FCP), the items were arranged 
in 24 pairs. Phrases in each pair had similar 
Preference values, but different discrimination 
values, The raters selected one phrase in 
each pair. 

Procedure, The classes consisted of 33-40 
Officers each. Each officer was assigned 20 
Other officers of his class to rate on the 
graphic scale. These assignments were made 
etn to a procedure which approximated 
ay selection, and the order in which the 

ings were to be made was specified. 

Üi A purposes connected with other studies 
cas 1S series and not relevant to this study, 
oe class was randomly divided into two 
ups of raters. One group signed its rat- 
Pa and was informed that these ratings 

Ould be available for official use; the other 
Stoup did not sign its ratings and was told 
wee these ratings would not be available for 
fe © use. Each ratee received half his rat- 
The rom one group and half from the other. 
sepa results in this study will be presented 
elie for the two groups as a partial 

cation device. 
ma a days after the graphic ratings were 
Office. each rater re-rated two of his fellow- 
ery first on one of the two versions of the 
eis technique and then on the same 
“lc scale used a week earlier. 

3 Berent Raters, Same Technique. Aver- 
8raphi the intercorrelation coefficients among 
Tatee - ratings from four different raters per 

iffens. determined. ; 
ri ent Raters, Different Techniques. Av- 

of the correlation coefficients between 


w 
f k . s 
Orced-choice ratings on the ninth day 


101 


Table 1 


Product-Moment Correlations Between 
Sets of Ratings 


Rating Techniques 


Same Different 
Graphic vs. 

Graphic es See 

vS. | Controlled Forced- 

Graphic Check choice 

Raters List Pairs 

Same .82 — oy 
69 52 — 
Different 30 29 23 
25 24 26 


and graphic ratings from three different raters 
per ratee on the first day was computed. The 
forced-choice ratings selected for this analy- 
sis were those made by the raters whose 
graphic ratings were omitted here. 


Results and Conclusions 


As shown in Table 1, the highest correla- 
tions were obtained for the sets of ratings 
made by the same raters using the same tech- 
niques (7 = .82, .69). Somewhat lower cor- 
relations were obtained for the ratings made 
by the same raters using different techniques 
(r= .57, 52). The lowest correlations were 
obtained for the sets of ratings made on the 
same ratees by different raters using the same 
techniques (r= .30, .25) and by different 
raters using different techniques (7 = .29, .26, 
.24, .23). 

It was to be expected, of course, that 
evaluations of the same ratees by the same 
raters rendered 8 days apart would be in sub- 
stantial agreement. It was also to be ex- 
pected that agreement would be less when 
the evaluations were made by different raters 
on the two occasions. However, the signifi- 
cant facts to be noted are these: (a) when 
the same raters were involved, agreement was 
greater when the same technique was em- 
ployed than when different techniques were 
employed; and (b) when different raters were 
involved it made no difference whether the 
same or different techniques were employed. 

Tt appears, therefore, that contamination 


102 


in this study was linked to the raters. Con- 
tamination resulting from similarity of tech- 
nique appeared only when the raters were 
identical and was virtually absent when the 
raters were different. 

In evaluating these findings, the following 
limitations of the study should be borne in 
mind: (a) the design of the study did not 
permit the use of all combinations of raters 
and technique; (b) it did not permit varying 
degrees of overlap in the two sets of raters or 
the time interval between ratings; (c) only 
a limited variety of techniques were studied; 
(d) the rater population may not have been 


Gloria H. Falk and A. G. Bayroff 


typical of Army raters. Nevertheless the 
findings were internally consistent and the 
following generalization stated in terms of 
criterion contamination may be offered: in 


validating rating instruments against cri- 


terion ratings, rater contamination is more 
serious than is technique contamination. If 
the raters who provide the predictor ratings 
are different from those who provide the cri- 
terion ratings, no technique contamination 
will result. If, however, the raters are the 
same, technique contamination may appear i 
the same techniques are used. 


Received April 27, 1953. 


- &tg 
b 


Tue Jou 
Vol. 38, Nal n PsycHoLOCY 


Validity versus Reliability 


Edward K. Strong, Jr. 


Stanjord University 


li —_ z preferable—a test with higher va- 
Dee i, Saeed reliability or a test with 
R alidity and higher reliability? 
Sie a investigations have raised this 
a i feither is sure but both incline to- 
reliabitity = greater weight to validity than 
fe rale Clark (1) says, “The evidence 
G personie makes any decision about 
improved eys to be used rather difficult, since 
weighed separation of groups must be 
abilities against decreased test retest reli- 
low ae alternative conclusion, that 
fon, wh, ility has little meaning ina situa- 
course ere high validity is obtained, is of 
liabilit a porsimiiiy; Had the estimate of, re- 
ure, tits “ie other than a test retest, meas- 
ive have been an attractive al- 
hie and Tucker (4) report: “BS scales 
Sis of on selected over BO scales on the ba- 
lidity ao validity. It is believed that va- 
Validity more important than reliability, that 
and teehee necessitates reliability, 
teported h e measures of internal consistency 
reliabili erein are not complete measures of 
B ity.” 
aaa s these investigations have been con- 
di ein. the selection of items for a key to 
Sick e two groups on the basis of their 
at dif, Clark’s aim was to select items, So 
ould 1 aspects of differences 1n inter- 
er Of ite be represented by the same num- 
Ave ms in each case. He does not claim 
achieved this desideratum nor to have 


developed a satisfactory method of doing so 
but that such was his aim and some reall 
progress was achieved. 

Strong and Tucker developed keys to dif- 
ferentiate medical specialists from medical 
men-in-general. They found that the origi- 
nal internist medical specialist key did not 
differentiate internists from psychiatrists to 
any marked degree. Items were then se- 
lected that would differentiate not only in- 
ternists from medical men-in-general, but 
also internists from surgeons, pathologists and 
psychiatrists. The original scales and the 
revised scales were designated, respectively, 
BO and BS scales. 

A small sample from each of these investi- 
gations is given here to illustrate the prob- 
lem. Table 1 indicates that the iterative and 
Gulliksen types of key are superior to the 
original, i.e., customary types of key, as far 
as validity is concerned but have appreciably 


er reliability. Table 2 illustrates the same 
a of Strong and Tucker. 
Differences were not great for three pairs of 
scales but were in the indicated direction. 
The really serious problem concerned the BO 
and BS internist scales. Here the BO scale 
of 278 items had a reliability of .86 in com- 
parison with the BS scale of 69 items and re- 
liability of .69. But the BO scale had a va- 
lidity of 69 per cent overlap, biserial 7 of .47 
in comparison with the BS scale which had a 
51 per cent overlap and biserial 7 of .68. 

It is to be noted that in both investigations 


low 
situation from dat 


Table 1 


Men-in-General (Clark) 


Separation of Aviation Machinists Mates from Navy 
Per Cent Overlap 
Z Type of Cross- Test-Retest 
Key” ppr Original Validation Reliability 
Original 3 65 58 85 
terative 42 51 51 -T4 
Gulliksen 497 56 51 75 


103 


104 


Edward K. Strong, Jr. 


Table 2 
Validity and Reliability of BO and BS Scales (Strong and Tucker) 


Validity 
Average Per Cent cities 
Type of Number of Total Biserial Reliability, 
Scale of Items Overlap r (odd-even 
4 BO Scales 285 52 67 85 
4 BS Scales 161 42 ar 419 


a decrease in number of items resulted in de- 
crease in reliability, although the retained 
items had individually greater reliability and 
„distinctly greater validity. 

The data in Table 3 have just been tabu- 
lated. Their import raises anew the ques- 
tion whether increased validity can offset de- 
creased reliability. The odd-even reliabilities 


Table 3 


Odd-Even Reliability of Scales and Test-Retest 
Correlations over an Average of 18 Years 


Test-Retest 


Odd-Even Correlation 
Scales Reliability N = 663 
Engineer 94 .79* 
Life Insurance .93 75 
Chemist 91 19 
Sales Manager 90 .68 
Real Estate -90 69 
Doctor 87 16 
Farmer -88 67 
Lawyer 88 73 
Office 88 65 
Production Manager 85 -67 
Accountant 84 05 
Banker -83 -72 
President 82 50 
Personnel 82 4 
Public Administrator 76 .48** 


* Previously as .76, bas a 
k Ko iene reported as .76, based on 203 cases (3). 


of 15 scales for the Vocational Interest Blank 
(2, p. 78) are given in the table and the co" 
responding test retest correlations for Ó f 
former college students retested on the aen 
age 18 years later. The eight scales bie 
high reliability (.88 to .94) have an avena 
test retest correlation of .73 in contrast to 
correlation of .60 for the seven scales he 
poorer reliability. The rank order arr 
tions between odd-even reliability and te 
retest correlation is .83. if 

The conclusion from these data is that 
one wishes to have test scores as a basis 
predict behavior in the distant future 
wants tests that will give as great agreem 2 
as possible between scores today and ae 
in the future and that the reliability of t 
scale is important in this connection. 


pe 
en 


Received May 19, 1953, 


References 


or 
1. Clark, K. E. Research on scoring methods Íy, 
the U. S. Navy Vocational Interest Inver of 
Technical Report No. 5, 1952, Departm¢ 
Psychology, University of Minnesota. me” 
2. Strong, E. K., Jr. Vocational interests 0. arsit) 
and women. Stanford: Stanford Univ? 
Press, 1943. of 
3. Strong, E. K., Jr. Nineteen-year follow-"P,53, 
engineer interests. J. appl. Psychol-, 
36, 65-74. seid 
4. Strong, E. K., Jr. and Tucker, A, C. The wae 
vocational interest scales in planning # No- 9 
cal career. Psychol. Monogr., 1952, 66. * 


THE . 
Vol, E oF Aen PsycHorocy 


Sampling Problems in Studies of Writing Style 


Richard 


D. Powers 


Department of Agricultural Journalism, University of Wisconsin 


See past few years have seen a growing 
need of studies in which the constituent 
Pe of writing style are examined indi- 
Rudol a and statistically. The work of 
PAT p Flesch (5) and of Edgar Dale (2) 
ag pe scoring” is perhaps best 
ith T ut this paper is primarily concerned 
with eed known developments (8, 10) and 
Morente uture of stylistic measuring devices. 
Somat as for readability are only a secondary 
Onsideration. 
ae most style studies has come the neces- 
aal OF sampling. Application of stylistic 
ae to the total content of a book or 
nies Paper may be unnecessary, and in many 
a s is usually physically impossible. Sam- 
es are the alternative. ‘ 
eh pe theory gives a basis for estimat- 
Rawk e minimum size of sample needed for 
Porc degrees of precision. However, the 
tando useful tests can be applied only to 
wher m samples; in other words, to samples 
ea e each unit of the kind being studied 
Si an equal chance of being drawn in the 
mple. 
eo brings us to a special problem of 
of Pling in writing style. Usually two kinds 
Units are involved in an analysis of style: 


l. Sentence characteristics—length, form, 


structure. 
- Word characteristics, and principally the 
relative difficulty of various words. 


ae in the most popular style-measuring 
al Meet sampling concerns both of these units, 
ite a ¢ all sampling has been drawn by draw- 
these ae of sentences and then analy: 
in a and the words in them as the 
Udies E formula requires. If, in future 
£ nee a only word measurements, 
ives a imption is made that this procedure 
Could alee sample of words, the results 
stylistic isleading, depending upon the type 
ic measurement. 


zing 


It is apparent that a 100-word sample 
drawn by using all the words in five randomly 
selected sentences does not fulfill the condi- 
tions of random selection of 100 words. And 
Baker (1) has presented evidence that selec- 
tion of 100 consecutive words in paragraphs 
could bias results of certain kinds of studies. 
Sampling by sentences in studies of word 
characteristics is actually a type of cluster 
sampling. It involves a selection of several 
connected units of analysis at one drawing. 
As such, it is a restriction of randomness 
which could cause logical and statistical diffi- 
culties in this type of study. 

For such a clustered sample, it is an error 

sigma 


to apply the formula SE = for deter- 
mining the precision of the sample. Of course, 
there are ways to evaluate the precision of a 
clustered sample. But to do that, we must 
have additional information, such as the 
amount and direction of intercorrelations be- 
tween the elements within the clusters. Data 
on the intercorrelations between words in sen- 
tences are necessarily very meager, so we have 
no way of knowing how we affect the pre- 
cision of our samples by clustering. 

Some relationships between words in a sen- 


tence are obvious, though. For instance, 
when “the” appears in a sentence, it is usu- . 
r a noun, some- 


ally followed by an adjective 0 
times by an adverb, but almost never by a 
verb. And, up to a point, the more words 
preceding a given word in a sentence, the 
more the nature of that word is predeter- 
mined. 

Because of this verbal contextual effect— 
one word increasing oF decreasing the prob- 
ability that the words which follow will be 
certain words or types of words—it would 
seem that some kinds of words might be over- 
represented when samples of sentences are 
drawn for studies of word characteristics. 
Likewise, it would seem that some kinds of 


words might be under-represented. 


105 


106 


Simple random sampling could safeguard 
against these difficulties. Theoretically, more- 
over, a study of sentence length must be done 
by drawing a random sample of sentences; a 
study of word lengths or proportions of parts 
of speech calls for a random sample of words; 
and, theoretically again, a study of clause 
structure must be made on a random sample 
of clauses. But we compromise between pure 
theory and practical considerations almost 
every day, with no drastic results and often 
with considerable economy. 


Procedure 


This study, then was an empirical attempt 
to see how “cluster” sampling (sampling by 
sentences) affected certain arbitrarily se- 
lected word variables: (a) representation of 
different parts of speech in the sample; (b) 
proportion of “hard” words (defined as words 
not in Edgar Dale’s list of 3,000 words); (c) 
Proportions of words of different syllable 
lengths (words of one or two syllables were 
called “short” words for the purposes of this 
study); and (d) Proportions of “structural” 
words (defined as prepositions, conjunctions, 
and articles—an exclusive, though not inclu- 
sive category). 

The two samples were drawn by equally 

random methods. The sample of 1,000 words 
was picked one word at a time by a table of 
tandom numbers. The sample of 64 sen- 
tences was also picked by a table of random 
numbers (997 words), 
For the word sample, the first four num- 
rs of the random number table designated 
the page number, the next number designated 
the paragraph number on that page or the 
following one, and the next two numbers 
indicated the word in the selected paragraph 
(or the following one) which was to be drawn 
into the sample. The sentence sample was 
selected using the first four random numbers 
for the page number and the next two num- 
bers for the sentence number on that page or 
the following one. The samples were drawn 
from a three-volume report the U. S. Depart- 
ment of Agriculture prepared for a congres- 
sional committee (12). 

This particular Teport was used for sam- 
pling because the first aim of the study had 


be 


Richard D. Powers 


been to establish grammatical and vocabulary 
differences between a “popularized” version 
and.a rather technical version of the same 
material. Both kinds of writing were con- 
tained in the report. The difficulty in deten 
mining sample size led to the present study, 
the original intent having been dropped. j 

A sample of 1,000 words was the size sé 
lected because, on assumptions of simple ma 
dom sampling, this size assures accuracy a 
word studies to within three per cent of t i 
sample value 95 per cent of the time whe 
allowing for maximum variability. Most a 
the time, the precision of such a samp” 
would approach two per cent for measur 
ments of parts of speech. A rough analys 
of the accumulated percentages by 50 wer 
subsamples shows that, in general, the m 
tence sample was more stable than the we 
sample. That is, the curve levelled off at 
earlier point than for the word sample. A 

The significance of the difference betwe j 
the’ two samples for each measure was gt 
lished by a “t” score—the difference 
tween the two, divided by the standard er" 
of the difference. 


Results 


Table 1 shows that for measurement of H 
proportion of different parts of speech gie i 
ing the sample by sentences didn’t spra 
cantly affect the results. (A word was a 
fied as a particular part of speech on it 
basis of traditional grammatical rules, het 
the exception that nouns modifying ned 
nouns modifying other nouns were class! 
as adjectives.) aul 

However, as shown in Table 2, the wae P. 
were significantly different for the orn ” 
ments of proportions of “short” words, one 
words, and “structural” words. These “ical 
gories are more rigidly defined and aa ul 
reasoning would tell us are more meant ead 
in measuring such style aspects as the ee 5 
ing ease of writing. No significant aitten 5; 
were obtained when the proportion of mathet 
adjectives, and verbs were lumped tog pe 
in a manner similar to that of fg yo 
“structural words” category. Since the $ that 
tural words are generally short wor 4 
are included in the Dale 3,000 wor 


ist: 


Sampling Problems in Studies of Writing Style 107 
Table 1 
Proportions of the Various Part: i 
s of Speech in a Sample Drawn by Selecting i 
from Randomly-drawn Sentences Sn ee 
Sample Drawn Sample Drawn 
by Words by Sentences Difference 
% N % N % N ae 

N 
pied 31 313 31 303 
Adjectives 2 2% 17 166 3 36 73 
si 14 136 14 137 1 mS 
o verbs 1 11 1 13 ia 2 = 
ts 18 185 19 194 1 9 ST 
Articles 7 68 8 82 1 14 
o utetons 5 47 6 56 1 9 = 

ronouns 4 36 5 46 1 10 Fe 

šo 

Score of 1.96 statistically signiñcant. 
these latter 
latter three factors are probably inter- difficulty. Though the actual behavior and 


related, 
t To hypotheses suggested by this part of 
S are that the measure of average 
Fagre = lengt per word is a less sensitive 
proporti word diffculty than the measure of 
eneth ions of words of different syllable 
fo — or that it may require huge samples 
might eal smaller shades of difficulty. This 
isa Be that Gunning’s Fog Index (7) 
o k gn measure based at least in part 
ng er premises than Flesch’s readability 
Quire la or that the Flesch formula may re- 
It arger samples than we have thought. 
titan also merit study in light of the 
Paters controversy between Farr, Jenkins, and 
as to T (3, 4) vs. Klare (9) and Flesch (6) 
Waide a ether or not a count of one syllable 
S would suffice to indicate vocabulary 


evaluation of these two measurements is a 
subject for further detailed study, Table 3 
shows the proportions of words of various 
syllable lengths obtained in this study: 


Discussion 


In applying the findings of this study to 
the field of style analysis, it can safely ‘sup- 
ply only a caution: that as measurements of 
style variables become more refined, the sam- 
pling methods may also have to become more 
refined. In other words, as style analysis 
goes from rather crude and vaguely-defined 
measurements such as proportions of parts of 
speech proposed by Stormzand and O’Shea 
(11) to the “psychogrammatical” categories 
proposed by Sanford (10) or to other refined 
ratios and categories (8), sampling by the 


Table 2 


P " 
Toportions of “Short” Words, “Hard” Words, and 


“Structural” Wor' 


Words and a Sample Drawn by Selecting All Words in Rar 


ds in a Sample Drawn Randomly by 
ndomly-drawn Sentences 


Sample Drawn Sample Drawn te 
i Words “byipentences Difference P 
S m N m N m N ç ç So 
Sh 7 2 Jee ea ie 
Hara oe 73 726 77 708 4 2 506. 
lila 39 386 32 318 7 68 390 
Sy Ctural Words 30 300 33 332 3 32 cae 
Wid 1.89 1.84 05 03 


* 
Se 
ore of 1.96 statistically significant. 


108 


Richard D. Powers 


_ Table 3 


Proportions of Words of Various Syllable Lengths in a Sample Drawn Randomly by Words and a 
Sample Drawn by Selecting All Words from Randomly-drawn Sentences 


“Short” Words 


“Long” Words 


1 syll 2 syll. 3 syll. 4 syll. 5 syll. 
N % N % N % N % N % 
Word Sample 509 51 a7 2 143 14 99 10 z : 
Sentence Sample 545 55 23 2 125 13 6 7 32 


traditional method of drawing words in sen- 
tences and in paragraphs should be examined 
carefully to see that the clustering of units 
does not adversely affect the results to such 
a degree that the efficiency of sampling was 
a false economy. 


Summary 


As our techniques of studying language be- 
come more refined, we need to take a closer 
look at our sampling methods. 

The samples usually drawn for language 
studies are made up of clusters of words, in 
sentences or paragraphs. In some studies, 
the words so collected have been subjected to 
further analysis. 

Such a sample is a “clustered” sample, and 
a network of unknown intercorrelations be- 
tween the words interferes with the known 
probability that any unit of the universe will 
be drawn in the sample. Thus, for some 
word characteristics at least, sampling by sen- 
tences would bias the sample of words in an 
undetermined direction, 

Random sampling of words is suggested as 
a way to sidestep such difficulties in word 
studies. A comparison of the two sampling 
methods (clustered and simple random) indi- 
cates that a clustered sample significantly 
overestimated the percentage of “short” 
words, “structural” words, and “easy” words. 
It is suggested that the structure of the sen- 
tence (the need to have many short and easy 


x n i- 
connective words) has imposed an order] 
ness that has biased the clustered sample- 


Received May 29, 1953, 


References 


1. Baker, S. J. A linguistic law of constancy- 
J. gen. Psychol., 1951, 44, 113-120. ree 
Dale, E. and Chall, J. S. A formula for Pi, 
dicting readability. Ed. Res. Bull, Ohio 
Univ., 1948, 27, 37-54. D G: 
- Farr, J. N., Jenkins, J. J. and Paterson, ease 
Simplification of the Flesch reading 337: 
formula, J. appl, Psychol., 1951, 35, ec! 
«Farr, J. N, Jenkins, J. J., Paterson, D. Ca 
England, G. W. Reply to Klare and FI an 
on “Simplification of reading ease formu“ 
J. appl. Psychol., 1952, 36, 55-57. phl 
- Flesch, R. A new readability yardstick. J- 4 
Psychol., 1948, 32, 221-223. Flesch 
- Flesch, R. Reply to “Simplification of hols 
reading ease formula.” J. appl, Psye 
1952, 36, 54-55. iting 
- Gunning, R. D. The technique of clear wr 
New York: McGraw-Hill Co., 1952. york? 
- Johnson, W. People in quandaries. New 
Harper, 1946. f the 
- Klare, G. R. A note on “Simplification O psy- 
Flesch reading ease formula.” J. appl 
chol., 1952, 36, 53. F psy” 
Sanford, F, H. Speech and personality. 
chol. Bull., 1942, 39, 811-845. much 
Stormzand, M. J. and O'Shea, M. How "nd 
English grammar? Baltimore: Warwi¢ 
York, 1924, pb 
12. U. S, Government Printing Office Report s pe 
search and Related Activities of the U j, C 


partment of Agriculture, Washington, 
1951. 


Ji. 


2. 


on 


~r 


10. 


11, 


| 


Tue Jourx 
5 NAL A 7 
Vol. 38, Now ser gure Psycnonocy 


Di } ae 
ifferential Prediction of Academic Success at Brigham Youn 
University ` Š 


Joics B. Stone 


Brigham Young University 


T 
A Ao on prognosis of college aca- 
three phase ss (1, 2, 3, 4) has focused on 
of general s of the problem: (a) prediction 
Sdlialaishi scholarship; (b) prediction of 
groups: i in specific subjects or subject 
jor areas (c) differential prediction in ma- 
Predictor or curricula. The most effective 
schoo] isaac have proved to be high 
scholastic €-point average, some measure of 
of high <page and an objective measure 
relations ool achievement. Multiple cor- 
ally, th have proved more efficient, gener- 
> 4 an zero-order correlations. 
aa a it study represents an attempt to 
can be a a regression equations which 
Piers ae in the differential prediction of 
‘Brigham asi in four college curricula at 
E oung University: (a) commerce; 
Nees: ntary education; (c) physical sci- 
; and (d) social sciences. 


Plan of the Study 


Cory} 
fin, aim Components. The four cur- 
Emi studied included the following aca- 

1 © departments: 
sie Gi 4 r es 
tion, Reece accounting, business administra- 

zle nce and banking, and marketing. 
ise mentary education. 
maties 2 Sical sciences: chemistry, geology, 

Spend Physics. 

Sociology sciences: history’, politica 


mathe- 


1 science, and 


Cri, . 
rm ie The criterion was selected to con- 
Point aye ose curricula. The curriculum grade- 
te of thee” (CGPA) was selected as the meas- 
fh oy he criterion. Only courses essential to 
CGPA pas were used in computing the 
Ours Was minimum of thirty curriculum credit- 
as required for each student. This mini- 


um : 
ac Pre ted from one-half to two-thirds of 
on mental major requirement for gradua- 
hecked 


yu Ei 
~ The reliability of the criterion was ¢ 


a This 
trated aaah is a portion of a dissertation proz 
of degree pa fulfillment of the requirements © 
p Utah, of doctor of philosophy at the University 
ou BS Tee z e writer is particularly indebted to Dr- 
Sly ag ch nd Dr. R. D. Willey, who served vari- 
airman of the dissertation committee. 


by correlating the CGPA of the fir 
curriculum credit against the onl COE, tak 
the respective criteria the reliability coefficients 
were: commerce, .78; elementary education, .82; 
physical sciences, .79; and social sciences 68. | 
Students. The commerce curriculum included 
102 students; 123 were in elementary education; 
133 in the physical sciences; and 78 in the social 
sciences. Except for the elementary education 
group, there was a predominance of male stu- 


dents in each group. 
Predictor Variables. The total high school 


grade-point average (HSGPA) and two tests 
were used. The tests were part of the entrance 
battery of this university. The 1949 editions of 
the American Council on Education Psychologi- 
cal Examination (ACE) and the Cooperative 
General Culture Test (CGCT) were used. Sub- 
test scores and total scores were tabulated. The 
Wherry-Doolittle method of test battery selec- 


tion was used. 
Results 


The most efficient single predictor of cur- 
riculum success was the HSGPA. In com- 
bination with the ACE Total score, it sup- 
plied the most efficient batteries, with an ad- 
ditional factor in the social science curriculum 
and two in the physical sciences. 

The multiple correlations for the most effi- 


cient battery and each curriculum are shown 
the respective 


in Table 1. Also shown are 
Index of Forecasting Efficiency (E), the Co- 
efficient of Determination (R?), and the 


Standard Error of R. 

The most efficient 
academic success in the c 
included the HSGPA an 
This battery accoun 


battery for predicting 
ommerce curriculum 
d the ACE Total 
ted for 40.1 per 


score. 
cent of the criterion variance, compared to 
35 per cent for the best single predictor 
(HSGPA). 

HSGPA and ACE Total, 


These two factors, 
o comprised the most efficient battery for 


predicting success in the elementary educa- 
tion curriculum. This battery accounted for 
53.4 per cent of the criterion variance, com- 
pared to 45 per cent for HSGPA, alone. 


als 


109 


110 Joics B. Stone 
Table 1 
Multiple Correlations of Certain Predictor Variables and Success in Four Curricula at 
Brigham Young University 
D R 
Curriculum N Battery R SEr E 
1 
Commerce 102 HSGPA & A.C.E. Total 663 060 23.4 rr 
Elementary Education 123 HSGPA & A.C.E. Total shal 042 31.8 33.7 
Physical Science 133 HSGPA, A.C.E. Total .733 040 32.0 
f CGCT Literature & 
General Science 95.7 
Social Science 78 HSGPA, A.C.E. Total 507 O84 13.8 ¥ 


CGCT General Science 


The factors, HSGPA and ACE Total, were 
supplemented by the CGCT Literature and 
General Science sub-tests in providing the 
most efficient battery for predicting success 
in the physical sciences. This battery ac- 
counted for 53.7 per cent of the criterion 
variance, compared to 33 per cent for 
HSGPA, alone. 

The factor, CGCT Literature, dropped out 
of the above battery in providing the most 
efficient battery for predicting success in the 
social sciences, leaving the HSGPA, ACE 
Total, and CGCT General Science. This 
battery accounted for 26 per cent of the cri- 
terion variance, compared to 18 per cent for 
the ACE Linguistic score. It should be noted 
that the criterion reliability for this cur- 
riculum was substantially lower than that of 
the other curricula. 

Multiple regression equations and conver- 
sion tables were prepared for each of the 
above batteries. It is possible for a counselor 
at Brigham Young University to take the 
student’s HSGPA, ACE Total score, and 
CGCT Literature and General Science scores, 
and determine the predicted grade-point av- 
erage (PGPA) for that student in any one 
or all of the four curricula studied, 


Summary 


1. The utilization of entrance test data 
and high school grade-point average pro- 
vides the counselor at Brigham Young Uni- 


versity with the basis for making difierential 
predictions of academic success in four C" 
ricula. a 
2. For commerce and elementary ear 
tion, the most efficient battery included = 
HSGPA and ACE Total scores. The resp 
tive R’s were .633 and .731. pest 
3. The physical sciences criterion wor 
predicted by a battery including the HS a. 
ACE Total, and CGCT Literature and ‘ 
eral Science. R for this battery was -79 


4. The social science predictor battery a 
cluded the HSGPA, ACE Total and 
General Science. R was .507. the 

5. The best single predictor was 
HSGPA. 


re 
6. The reliability coefficients of the Go 
terion measure (CGPA) clustered aroun 
except for the social science curriculum 
an r of .68. 


with 


Received June 2, 1953. 


References 


1. Crawford, A. B. and Burnham, P. S. p 
college achievement, New Haven: Y? 


recast’ 


uni 


versity Press, 1947, edut” 
2. Monroe, W, S. (editor). Encyclopedia of milla” 
tional research. New York: The Mac 
Company, 1950. Pp. 882-886. alue? 
3. Wallace, W. C. Differential predictive V"" ggh 
the A.C.E. Psychological Examination: t 
& Soc., 1949, 70, 23-25. achiev’ 
4. Wolf, R. R., Jr. Differential forecasts of 8°. oyinf* 


ment and their use in educational COU 
Psychol. Monogr., 1939, 51, 1-53. 


Ta — | eC 
eee 
EL —————— ee 


Tue Joi 
Vol. ditean ar Aeru PsycHoLocy 


Performance of College Students on a Mechanical Knowledge 
Test 


Benjamin Balinsky and Charles Hujsa 
City College of New York 


Pi given the SRA Mechanical Aptitude 
cholo s part of a course in Vocational Psy- 
tae e students commented that they did 
Sibte as well on the Mechanical Knowledge 
Arithm as on the Space Relations and Shop 
the REN In order to test the comment, 
ts al Mechanical Aptitude Test results of 
nae so students were tabulated. All stu- 
of th ere in either the junior or senior year 

e School of Business of the City Col- 


lege of New York and between the ages of 
19 and 22. 

The Revised Minnesota Paper Form Board 
Test was available on the 112 students, and 
the Location, Blocks, and Pursuit subtest 
scores of the MacQuarrie Test for Mechani- 
cal Ability on 50 of the 112 students. These 
tests were also included in the study. 

Test intercorrelations were calculated for 
all combinations of tests. Tests of signifi- 


Table 1 
LER Test Intercorrelations for College Students f 
Rev. 
Mech. Space Shop Total Minn. 
Knowl. Rel. Arith. SRA P.F.B. Loc. BL. Pur. 
N 
et Knowl. — 09 07 oa" 19* 20 28* 23 
Shee Ral, ‘18* ‘30"* jon" 18 42K “aa 
OP Arith. “44** 41 30* 14 33* 
otal SRA 36%" 31* 30% ae 
Rev. Minn. P.F.B M7 52" 50** 
-Ocation i AS) .24* 
locks 32** 
Pursuit = 


rT : 
r (Total SRA Minus Mech. Knowl. X Mech. Knowl.) = -12 


Ti 
ran SRA Minus Space Rel. X Space Rel.) = 18* 
al SRA Minus Shop Arith. X Shop Arith.) = -15 


Í The correlations with the Location, Blocks and P 


ignific: 
+ o8nificant at the 5% level. 
'gnificant at the 1% level. 


ursuit subtest 


s are based on 50 subjects; others on 112. 


Table 2 
Means, Standard Deviations and Tests of Significance for College and SRA Male Trainee Groups 
College Group SRA Male Trainees 
ee s 
ii Mean S.D. Mean S.D. O È P 
spect Know 25.0 63 31.8 7.1 10.27 <.001 
Shoe A ations 24 H 1900047 3.19 <0t 
Tae Arith. 144 3.7 9.8 3.8 11.90 <.001 
ne 59.6 10.6 60.5 12.2 0.78 42 
111 


. 


112 


cance were computed for the difference be- 
tween the means of the test scores of the col- 
lege students and the norm group of male 
trainees in the SRA Mechanical Aptitude 
Test. These data are presented in Tables 1 
and 2. 

Incidentally, the mean Mechanical Knowl- 
edge score of the students is at the 19th per- 
centile of the male trainee norms, the mean 
Space Relations at the 55th percentile, and 


Benjamin Balinsky and Charles Hujsa 


the mean Shop Arithmetic at the 87th per- 
centile. The mean score of the college stu- 
dents on the Revised Minnesota Paper Form 
Board is at the 70th percentile of the 
machine and electrical apprentice applicants 
norms and the difference between the means, 
of both groups is significant at < .001 in 
favor of the students. 


Received June 19, 1953. 


Tre Journ 
Mae 


Relation of Scholastic Aptitude to Socioeconomic Status and to 
a Rural-to-Urban Continuum 


Norman F. Washburne and Dean C. Andrew 
Southern State College, Magnolia, Arkansas 


Saeed aptitude tests play an impor- 
in a modern education. They are used 
in has or the purpose of aiding students 
avane me courses and vocations, and have 
ing ea. of other uses in guidance, counsel- 
Southerr Pa aspects of research. At 
siimintgte i College it is the practice to 
cal Haan the college level ACE Psychologi- 
Where os to all entering freshmen. 
to nase p a practice prevails, it is desirable 
tended t if the test measures what it is in- 
might to measure, and what factors, if any, 

Th bias the results of the test. 
lege H student body of Southern State Col- 
Souther composed primarily of residents of 
cede eer Louisiana, and 
See i It is a regionally homogene- 
ileten, ation. There are no significant 1m- 
dents ene and the undergraduate stu- 
ation is all Caucasians. However, the popu- 
socioeco, quite varied in two respects: (1) the 
dent: roe gi status of the individual stu- 
in whi . (2) the sizes of the communities 
pil the students have grown up- The 
in soci n therefore arises, do these variations 
tiie joeconomic status and degree of urbani- 
toe et the scholastic aptitude of the 
gical " as measured by the ACE Psycho- 

I tyamination? 

Te to test this question, a sample of 
who Se was drawn at random from those 
student been enrolled in April 1952.” These 
also fie had been given the ACE and had 
Þossibl ed out a questionnaire which made it 
Statuses to determine their socioeconomic 
zation S and the relative degrees of urbani- 
cients 2 their residence histories. Coeffi- 
Conpufed total and partial correlation were 
Ships i in order to determine the relation- 

mong the three variables. 


T 
; he Scholastic Aptitude Test. The ACE 


Thi: 
Undergrad mre size approximates one-sixth of the 
uate student body at the time. 


erar 
aah pe come e scholastic aptitude of 

ge freshmen. Its scoring yields 
three measures: the Q score; the L score; and 
the total score. The Q score is a measure of 
the respondent’s ability to solve problems of 
quantitative nature. The range of the Q 
scores of our sample was from 9 to 60. The 
L score is the measure of the respondent’s 
ability to solve problems of a linguistic na- 
ture. The range of the L scores of our sam- 
ple was from 27 to 92. The total score is a 
sum of the Q and L scores and is a measure 
of the total scholastic aptitude of the respond- 
ent. The range of the total scores of our 
sample was from 36 to 142. 

The Socioeconomic Status Scale. The so- 
economic status scale is an instrument de- 
antify the social and economic 
he college student.° Its scores 
the occupation of the stu- 
d upon the educational at- 
of the student’s parents. 
d educational factors are 
The scale differs from 
status scales used 
ays: (1) its oc- 
bitrarily scored, 
udents’ own 


cio 
signed to qu 
position of t 
are based upon 
dent’s father, an 
tainments of both 
The occupational an 
weighted equally. 

some other socioeconomic 
in similar studies in two W 
cupational factor is not ar 
but rather is based upon the st 
evaluation of occupations representative of 
those of their fathers; and (2) it does not 
assume that social classes exist as discrete 
Council on Ed 


1948 College 
Cooperative Test D 


ucation Psychological 
Edition, Educational 
ivision (Prince- 


2 American 
Examination, 
Testing Service, 
ton, New Jersey); 1951. 

3 Details of the construction and validation of the 
Socioeconomic Status Scale and the Residence His- 
tory Scale are to be found in Norman F. Wash- 

ttitudes and Responses as Related 


burne, “Urban” A s$ 
to Residence wm Urban Communities and to Socio- 


economic Status, Ph.D. Dissertation, Washington 
University, St. Louis, Missouri, 1953. This work 
has also been published in mimeographed form as an 
Institutional Study of Southern State College, Mag- 
nolia, Arkansas, and a limited number of copies are 


available on request. 


113 


114 


Table 1 


N. FE. Washburne and D. C. Andrew 


Total and Partial Correlation of Scholastic Aptitude Scores with Socioeconomic Status Scores, 


100 Southern State College Students 


Probability Bari Correlation 
i É of Null with Residence : 
Total Coniston Hypothesis History Held Constant Hypothesis 
Q score 024 PS 205 .024 B> 05 
L score -123 P> .05 .070 P > 05 
Total score .166 PS 05 115 P > 05 


r of Socioeconomic Status vs. Residence History = +.19. 


cultural units, but rather assumes a con- 
tinuum of socioeconomic statuses. The points 
of the scale are handled statistically as if 
they were the midpoints of intervals along 
the continuum. The theoretical as well as 
the actual range of the socioeconomic status 
scale is from 2 to 10. 

To clarify the meaning of the scale the fol- 
lowing examples are offered: The father of 
one student scoring 10 on the socioeconomic 
status scale is an owner of a large manufac- 
turing plant. Both of the student’s parents 
had gone to college and.one of them had 
taken graduate work beyond the baccalaure- 

` ate degree. On the other end of the scale the 
father of one student who scored 2 is a day 
laborer on a farm. Neither of this student’s 
parents had completed the sixth grade in 
school. 

The occupational factor of the scale cor- 
related highly with North and Hatt’s similar 
scheme * and so is judged to be valid. Socio- 
economic status scores were computed for a 
sample of 100 students from data gathered 
on two different Occasions, and the scale was 
found to be reliable. 

Residence History Scale. The residence 
history scale is an instrument designed to 
quantify the degree of urbanization of the 
backgrounds of individuals. It is, as far as 
we know, the first instrument which goes be- 
yond the simple characterization of students? 
home-towns as being either rural or urban, 
It is a complicated device which takes into 
account the size, degree of isolation, and 

* Cecil C. North and Paul K. Hatt. 


cupations: a Popular evaluation. 
September 1, 1947, pp. 3-13. 


Jobs and oc- 
Opinion News, 


‘Table 1. 


Probability 
of Null d 


individual’s places of residence from the i 
he entered the first grade until the preso 
It also takes into account the length of uni 
the individual spent in each place of zo 
dence. The residence history scale assume 
a rural-to-urban continuum. It has a the? | 


proximity of larger urban centers of all the 
| 
| 


retical range from 0 to 50. A score of 
would indicate that the individual has livet 
all his life more than 100 miles away fron 
the nearest community of 250 populatio”: 
At the other extreme, a score of 50 be 
indicate that the student has lived all his ie 
within 6 miles of a city of a population of i 
least one-half million. The actual range t 
the residence history scores of our sambal 
was from 10 to 48. Residence history se 
were computed for 100 students from h J 
gathered on two different occasions, and 
scale was found to be reliable. 


Results 


the 
The relationships between the scores Wise 
scholastic aptitude test and the socioecono™, 
status scores of the sample are presente 


It can be seen from Table 1 that all om i 
efficients of total correlation were low a 
Statistically not significant. However; a 
the coefficient of total correlation bet™' ic 
residence history scores and socioeco”®" 4 | 

Y 


— muaa aee aaeoa 
~y 


status scores was found to be + .19, it seen 
feasible to seek an understanding. of tr le 
fects of each of the factors upon the scholas 
aptitude scores when the other was held o 
stant. Table 1 therefore also presents the 


e 
tes 
efficients of partial correlation of thé 


Relation of Scholastic Aptitude to Socioeconomic Status 


115 


Table 2 


Total and Partial Correlation of Scholastic Aptitude Scores with Residence History Scores, : 
100 Southern State College Students 


Partial Correlation Probability 


c r Probability 

5 oefficient of of Null with Socioeconomic of Null 
‘otal Correlation Hypothesis Status Held Constant Hypothesis 
Q score 245 P <.01 .245 P <01 
L score .302 P <.01 .286 P <.01 
Total score 308 P <.01 :295 *P <.01 


ae socioeconomic status scores while 
one of 7 oe history scores are held constant. 
to be signi e resulting relationships are shown 
Th gnificant. 
sores pe rates between residence history 
Presented the scholastic aptitude scores are 
in Table 2. 
Rai efficients of total correlation between 
aptitude. history scores and the scholastic 
significa, scores are shown in Table 2 to be 
Means he at the one per cent level. That 
ors w at the relationship between the fac- 
a happen less than one time in a 
Partial by chance. When coefficients of 
Scores correlation of the scholastic aptitude 
calculated the residence history scores were 
Were fad while socioeconomic status scores 
cients constant, all of the resulting coeffi- 
aia ed lower than the total co- 
Which S with the exception of the Q score 
ese ae the same. However, even 
te peed lower coefficients of partial cor- 
one „o Were found to be significant at the 
ad cent level. All of the relationships 
the e the direction of rural-to-urban, 1.€., 
the s ay urban the background the greater 
Cholastic aptitude. 


Summary and Conclusions 


This investigation attempted to discover 
the relationships between scholastic aptitude, 
socioeconomic status, and placement of the 
individual upon a rural-to-urban continuum, 
as these variables applied to Southern State , 
College students. The results seem to justify 
the following conclusions: 

1. For this group of college students there 
is no significant relationship between socio- 
economic status and scholastic aptitude as 
measured by the ACE Psychological Exami- 
nation. 

2. There is:a significant, though low cor- 
on between placement of the students’ 
residence history upon a rural-to-urban con- 
tinuum, and their scholastic aptitude as meas- 
ured by the ACE Psychological Examination. 
That is to say that students from more urban 
backgrounds tend to receive higher scores 
than do students from rural backgrounds. 

Because these findings apply only to South- 
ern State College students, jt is suggested 
that further research be conducted on stu- © 
dents in. other schools and in other regions, 
to see if the findings are confirmed. 


Received May 14, 1953. 


relati 


Tue JOURNAL or APPLIED Psycuorocy 
Vol. 38, No. 2, 1954 


Further Results on Group Manual Dexterity in Men 


Andrew L. Comrey and Gerald Deskin 


The University of California at Los Angeles 


In a previous experiment,’ 65 pairs of vol- 
unteer male university students were given 
six individual trials on the Purdue Pegboard, 
Assembly Task, and six trials on the Assem- 
bly Task with the two members of each pair 
working together on the same assemblies 
rather than individually on separate boards. 
The members of each pair were divided on 
the basis of the total of the last four indi- 
vidual trials, Assembly Task, into “high” 
and “low” categories. Reliabilities were de- 
termined for “high,” “low,” and “group” 
performances, using alternate trials and cor- 
recting for doubled length. Correlations of 
the “high” and “low” performances with the 
“group” performance and with each other 
were computed and corrected for attenua- 
tion. The multiple correlation and regression 
weights were obtained for predicting “group” 
performance from “high” and “low” indi- 
vidual performances. The results showed 
that less than half the group performance 
variance could be predicted from a knowl- 
edge of the individual performances, even 
with the effects of errors removed. The level 
of group performance was only slightly more 
dependent on the “low” individual perform- 
ances. For all practical purposes, equal 
weights could have been used for “high” and 
“low” scores in predicting “group” perform- 
ance, 

The present experiment was designed to 
provide a check on the first experiment and 
to determine the effect of an alteration in the 
nature of the individual task on the amount 
of group variance which could be predicted. 
One of the hypotheses offered to account for 
the fact that much of the variance in the 
group performance scores could not be pre- 
dicted from a knowledge of the individual 
performances was that the two tasks might 
have been too unlike each other. Although 


1 Comrey, A. L. Group performance in a manual 
dexterity task, J. appl. Psychol., 1953, 37, 207-210. 


the same end product resulted in both indi 
vidual and group performance, the latter 2, 
quired the subjects to alternate the bine 
they performed on successive assert 
The first subject, for example, would P ia 
a peg in the first hole on his side of 
board, after which the second subject wou 
add a washer and the first subject bet 
follow with a collar, and finally the a 
subject would complete the assembly Ý js 
another washer. Instead of repeating, e 
operation, however, the subject who finis 
the assembly would begin the next assen 
by placing a peg in the second hole 0” o 
side. The first subject would then place 
the first washer, and so on. 

Since in the individual task, the sU 
performed each assembly just like the ae 3 
ous one, he was not confronted with the ub 
ditional task of altering his set for each $ 
sequent assembly, as he was required toa 
for the group performance task. It ee 
pothesized that this requirement for inte 
ing set might have introduced aat t 
abilities into the task which were not pia ne 
in the individual task, thereby lowering res 
validity of the individual performance E, A 
for predicting the group performance = was 

To test this hypothesis, the experime? ask 
repeated using a redesigned individua g 
which required the subjects to make & ra re 
of set on each assembly like that to 3 of 
quired later in the group task. Inst¢ due 
using the standard instructions for the Piet? 
Pegboard, Assembly Task, the subjects ope 
instructed to begin each assembly afte ace 
first one with the same hand used tO Fon: 
the final washer on the preceding a aoa 
In this way, the subject was required t° oP 
alternate assemblies with reversed ha? 0s” 
erations, substituting the left hand he 
operations Previously performed W! 
right hand, and vice versa. 


pject 
evi- 


116 


\ 


| 


| 


Group Manual Dexterity in Men 117 
Table 1 
Summary of Results 
. Corrected r with 
core a 
M s u High Low Group Wake 
High 156 18.0 89 100 5055 31 
(192) (16.5) (90) (52) (.56) (35) 
Low 137 17.2 94 .50 1.00 64 49 
(173) (16.8) (92) (52) (.59) (Al) 
Group 186 19.2 a7 55 64 100 
(178) (19.2) (87) (56) (59) 
R= .69 R?= .48 
i (.66) (.44) 
Results and Discussion performances. The multiple correlation, R, 


man every way except those differences al- 
and i mentioned, the experimental procedure 
for ao of the data were the same as 
not b e first experiment,’ and therefore will 
up ae here. The sample was made 
where irely of undergraduate men this time, 
ate pe about one-third of them were gradu- 
subject ents before; the previous 65 pairs of 
betimen was reduced to 47 pairs for this ex- 
‘able - The results are summarized in 
show N The figures included in parentheses 
While a results from the first experiment 
are E ic numbers immediately above them 
tick bondig values in the present 
x the first column of Table 1 are listed 
tore score categories, “high,” “low,” and 
tal P,” standing, respectively, for those to- 
Mea tormances already described. The 
Sets of and standard deviations of the three 
ird Scores are given in the second and 
n oo umns, respectively. These are based 
iš e totals of the last four of six trials. 
Stam aure was used to obtain greater 
Splitch i“ In the fourth column are given the 
on alf reliability estimates for the three 
Table 7 scores. The next three columns of 
tal scor give the intercorrelations of the to- 
in oth variables, corrected for attenuation 
variables. The last column contains 
than Weights for predicting “group” per- 
nce from “high” and “low” individual 


Comrey, A. L., op. cit. 


and R®, are given in the bottom row of the 
table. Both the beta weights and R were 
computed using the corrected correlations. 
The uncorrected correlations for the present 
experiment were: low-high, .46, low-group, 
.53, and high-group, .47. For the previous 
experiment, the corresponding uncorrected 
correlations were .48, .53, and .50. 

An inspection of Table 1 reveals certain 
discrepancies which require some comment. 
In the present experiment, the mean indi- 
vidual scores were considerably lower than 
for the previous experiment, which was ex- 
pected because the task was more difficult. 
This resulted in a slight increase in variance, 
too, which again could have been expected. 
The present group performance mean was 
only slightly higher, probably due to indi- 
vidual-task practice in changing set, not 
available to performers in the first experi- 
ment. The variances were identical for the 
group task in both experiments, an outcome 
consistent with expectations in that the task 
was exactly the same in both cases. 

The reliabilities compare very favorably in 
the two experiments, except for the group 
performance score. In this case, the present 
figure was lower than the previous one. The 
corrected correlations are close in the two ex- 
periments, although the discrepancies are in 
opposite directions for the low-group and high- 
group correlations, resulting in a more im- 
pressive difference between the beta weights, 


118 
Whereas the beta weights were fairly close in 
the first experiment, the low scores emerged 
in this research with a definite edge for pre- 
dicting group performance, although the dif- 
ference was still short of statistical signifi- 
cance. 

Looking at the comparative multiple cor- 
relation coefficients and their squares, it is 
evident that no startling improvement has 
occurred in the amount of. group-perform- 
ance-score variance which can be predicted 
from a knowledge of individual performance 
scores. The proportion of predicted vari- 
ance is still less than half. This figure was 
‘achieved only through using correlations cor- 
rected for attenuation. The proportion of 
variance practically predictable would be. less. 
It is perhaps worth mentioning that the 
multiple R values would have been even 
closer if the “group” score reliability in the 
second experiment had been higher. Since 
the figure obtained may be spuriously low, it 
would be well to consider the gain actually 
achieved with some caution. 

The results do not bear out the hypothesis 
entertained that prediction of group perform- 
ance scores can be increased markedly by 
making the individual task apparently more 

_ like the group task in the actual operations. 


Andrew L. Comrey and Gerald Deskin 


Two other hypotheses, as yet untested, we 
offered in the previous article to account f 
the additional unpredicted variance. A 
group task may involve some special tral | 
introduced by the necessity of cooperating | 
with another person and there may be inter 
action effects among individuals over “A 
above stable trait influences. Attempts “A | 
be made in further work to explore the ? 
ture of this as yet unpredicted variance. 


Summary 
as 10 


A previously reported experiment W for- 


peated with an altered design to test the if 
mer results and a hypothesis offered tO nce 
count. for the fact that group performa iy 
scores on a manual dexterity task could aaa 
be predicted rather imperfectly from koe 
edge of individual scores on a similar ellie 
The hypothesis was offered that the PY, 
tion might be substantially improved ind 
change in design to make the group 4” chat 
vidual tasks more comparable in the © pt 
acter of the operations involved. The am yas | 
of improvement in prediction obtain’ pe 
so slight as to require the rejection ° 
hypothesis, 


Received May 8, 1953, 


Tue Jo 
ve tee oi ApELIED PsycHOoLocy 


Effects of Fatigue and Anxiety on Certain Psychomotor and 
Visual Functions * r 


Sherman Ross, T. A. Hussman, and T. G. Andrews 


University 


et ne amg was an attempt to investi- 
iced te egree of behavior decrement pro- 
Gt ped experience of fatigue and threat 
fits a. damage occasioned in the competi- 
T iab etic sport of boxing. The dependent 
age chosen as possible indicators of be- 
(b) N were: (a) steadiness score; 
cae a sway score; (c) body sway time 
flicker A ) tapping rate; and (e) critical 
heres a pid The primary purpose of 
an nee riment was to determine: whether or 
coe ages on each of the five depend- 
of int iables changes significantly as a result 
the one muscular exercise (fatigue) or 
ie of bodily injury, (anxiety) or the in- 
as ion of these conditions in the collegiate 
oe boxing situation. 
eee has been some speculation in the 
ior oe the damaging effects on be- 
ceived oO sustained head blows such as re- 
i dadiet continuous training in boxing (13). 
R teak to these interests 1n boxing, such 
fon /or ion appears to offer a realistic condi- 
anxiet systemic fatigue, high motivation, and 
tite z such as could not be attained under 
Rea ual conditions of laboratory investiga- 
Boss These characteristics are not unlike 
of wo obtain in certain field conditions 
chemo operations and combat. In the 
Tement search for indicators of behavior dec- 
made of for military purposes, the use was 
Charact boxing behavior to approximate these 
The oe of military importance. | 
ted 4 asis for the selection of the indicators 
ie a investigation is described below 
Bethe ch of the five dependent variables to- 
esti t with a description of the manner of 
Ing, 
ED 


hi A F ; 
behay; >. CxPeriment is one of a series of studies on 


DA 40 noe ccrement performed under Contract No. 
and Dey ee ieee between the Medical Researc’ 
Stal, De; elopment Board, Office of The Surgeon en- 
Maryland. ment of the Army and the University of 
is Yeport The opinions and assertions expresse 
Dart: do not necessarily reflect the views of the 
Ment of the Army. 


d in. 


of Maryland 


Tests and Indicators Used 


Steadiness has been demonstrated to show 
changes under certain conditions of stress, and it 
has been reported to change with fatigue or work 
output (1, 4, 5, 18). Hand -steadiness and 
tremor have also been related to emotional stimu- 
lation (6, 7) and to certain conditions of motiva- 
tion (4). Because of these features, a test of 
hand steadiness was included among the depend- 
ent variables. For this test a target hole in a 
vertically adjustable metal plate was used. The 
subject’s task was to keep a 0.02 inch diameter 
stylus inserted into the 0.136 inch hole for 20 
seconds with the arm fully extended and unsup- ` 

orted. The number of contacts with the edge 
of the hole during this period served as the score. 

Body sway measurements have offered rather 
controversial results in the past when related to 
fatigue (11, 18) and to loss of sleep (5, 15). 
Because of the possible effects of head blows 
sustained in boxing, measures of body sway were 
obtained. For this purpose an arrangement simi- 
lar to that for steadiness was used. However 
in this case the stylus was longer and the hole 
diameter was 0.358 inch. The subject was re- 
quired to hold the stylus in the hole, but in this 
case without the aid of visual cues. When con- 
tact was made with the edge of the hole, a buzzer 
was automatically sounded as a signal to the sub- 
ject. Two scores were derived from this test: 
a body sway score of the number of contacts 
made in the 20 second period, and a body sway 
time score consisting of the total amount of 
time in seconds the stylus was in contact with 
the edge of the hole during the observation pe- 
riod. These were treated as separate scores in 


the analysis of the data. $ 
of rather sim- 


Tapping tests serve as measures a 
ple performance, but have been considered. by 
ful indices of fatigue 


some investigators as use 
(15, 16). Tapping has been shown to be re- 

lated to the decrement produced by high altitude , 
(9).. The tapping test apparatus used here con- 

sisted of the Dunlap modification of the Whipple 

Tapping Board (3) and a 0.20 inch diameter 

stylus. The tapping targets were, two 3 inch 
square’ brass plates separated by 1 inch of bake- 
lite. The subject was to tap alternately on the 
plates as rapidly as possible for a period of 15 
seconds. The score used was the total number 
of taps on the plates in this allotted time. This 
brief time period was used as an attempt to di- 
minish the factor of learning, which has been 


shown to affect tapping scores (17). 
119 


120 


Critical Flicker Frequency has been used in 
several investigations on fatigue with contro- 
versial results (12, 14, 18). There has been 
some indication that CFF changes when the in- 
dividual is subjected to intensive strain (2). 
The apparatus used in the present study was the 
Krasno-Ivy Flicker Photometer (8), which is es- 
sentially an episcotister arrangement delivering 
square wave flashes of light on a 34 inch ground 
glass screen. The subject was seated 5 feet from 
the stimulus screen. A modified method of lim- 
its was used, in which the experimenter manipu- 
lated the stimulus from “fusion to flicker” and 
the subject responded at his threshold. Six 
“descending” trials were employed, the first two 
serving as practice. The score or threshold 
measure was the mean number of flashes for the 
last four trials. 

In each of the above tests only a brief period 
could be devoted to obtaining a score, since in 
many instances the subjects were being measured 
immediately after strenuous exercise and before 
they were covered, rubbed down, or bathed. 
Longer testing periods would have increased the 
reliabilities of the measures taken, but also might 
possibly have allowed the injurious effects of 
chilling the subjects 


Subjects 


Twenty-four male college students ranging in 
age from 19 to 25 years were used as subjects. 
Twelve of the group were experienced collegiate 
boxers and members of the University of Mary- 
land Boxing Team for 1952. The remaining sub- 
jects were members of a Physical Education class 
in boxing and should be classed as novice boxers. 
All subjects were in excellent physical condition, 


Independent Variables and Experimental 
Design 


_ Each of the 24 subjects was measured three 
times on each of the tests under each of four 


conditions of the investigation. These four con- 
ditions were as follows: 


a. At Fest, no previous strenuous exercise, no 
expectation of going into the ting to fight. f 

b. Before fighting a three-round supervised 
bout, no previous exercise, 

c. After three rounds of very strenuous work- 
out on a heavy punching bag, not in the ring nor 
expecting to go into the ring. 

d. After fighting a three-round supervised bout 
with an opponent. 

These four conditions yield a basic 2 X 2 block 
of the experimental design, which is diagrammed 
in Table 1. It may be seen that this arrange- 
ment opposes the no-exercise conditions (F-0) 
to the heavy exercise conditions (F) for a test 
of the change in each variable as a result of 
fatigue. The test of change in each variable due 
to the anxiety occurring in the boxing situation 


S. Ross, T. A. Hussman, and T. G. Andrews 


j 


is made by opposing the no-anxiety oida 
(A-0) to the high anxiety conditions (A). nile 
problem of fatigue in this arrangement a5 oe 
straightforward. The problem of anxiety, may 
ever, offers some question. In this regard ue a 
be said that all observations on and reports ua 
the men immediately before and after sach e< 
petitive boxing indicate severe tension an mage 
cern over the threat of pain and bodily da 
or loss of the bout. . rder 
In order to minimize the effects of the jec 
of taking the tests in the battery, each S 
was randomly assigned to one of the 24 pos ais 
orders of test administration, which he spec 
tained throughout the experiment. Each Sume 
was measured 12 times on each test, three ron 
under each of the four experimental ona con- 
The restriction placed upon the order of wet the 
ditions was that the first time a subject too hat 
tests he was under the rest condition ah the 
giving the instructions did not interfere Wi 
condition nor the reverse. 


Results and Discussion 


zed 
. analyz 
The results are presented and ana8” 


i 
ari 
separately for each of the dependent W | 

ables studied. In each case reference iS n 
to the paradigm presented in Table bi 
the code letters used refer to the designe im 
experimental conditions and their caini a 
tions as a system for presenting the obt@ 
means, ned 
The experiment was conceived and desig” ie 
to allow analysis of the results in two sepa as 
manners. The fact that each block of nate! 
ures taken on the twenty-four Ss is replici ual 
twice allows the use of a within-indiv rro" 


r P an 
estimate of variance to be used as 4 


Table 1 


5 ` it 
Experimental Design Indicating the C gnit 
Measurement and Their Relationshil 


í 
3l 


vaag 
jon> 


FATIGUE 


Absent Present 


Absent 


ANXIETY 
n= 24 A 


Present 


n=24 | 4-0 
3 


Effects of Fatigue and Anxiety 


term to evaluate the effects of the treatment 
conditions on the variables in the population 
used. -This error term contains variance of 
two types, that associated with instrument 
error and individual diurnal variation. This 
analysis is intended to test the theoretical 
and perhaps somewhat obvious question of 
whether these variables are affected by the 
treatment conditions of fatigue and anxiety 
ìn the sample used. 

The second analysis, which uses an esti- 
mate of the individual differences variance as 
the error term, is intended to answer the 
question of whether these test variables are 
useful as reliable indices of the independ- 
ent variables for practical application. Fre- 
quently the question of whether a variable 
changes significantly as a result of such con- 
ditions as fatigue and anxiety has been con- 
fused with the question of whether it may be 
used as an adequate indicator of these con- 
ditions. The two analyses employed test 
each of these questions in turn with what is 
felt to be the proper error term for each. 

he second analysis also provided a test of 
the replications as a main effect, thus ena- 
bling a check on possible changes due to learn- 
ing, the presence of which of course would 
Cast some question on their usefulness as 
indicators, In all cases tests of homogeneity 
of variance were satisfied. Table 2 presents 
he means for each experimental condition 
for each of the dependent variables used, ac- 
cording to the paradigm in Table 1. Tables 

and 4 present composite results of the tests 
of significance. Reference is made to these 

ree tables in the description of results for 
each type of experimental measure. 
__ Steadiness, The total mean score for all sub- 
e under all conditions was 72.23; for con- 

ition F-O = 62.0, F = 82.46, A-O = 72.25, 

= 72.22. The differences associated with 
atigue conditions are significant at the .001 
ee as are individual differences. Anxiety 

Nditions effected no change in the measures. 
a here is a questionable interaction between 
Hoa and anxiety, and the interaction be- 
vena, fatigue and individual differences 1S 
a. Significant as is the interaction of anxiety 

in individual differences. From these com- 
ations of interactions it appears that anx- 


121 


Table 2 


Means of Experimental Resu!ts for Specified 
Tests and Conditions 


Fatigue 
0 + 
ie gree | 
o | 6o44 | s405 | 72.25 
Steadiness Anx. | 
| 72,22 
72.23 
0 + 
co 
0 | 26.25 32.11 | 29.18 
Body Sway — Anx. =| = 
+ | 28.22 | 34.48 31.35 
| 
27.24 33.30 30.26 
0 + 
| ( 
0 | 390.36 | 638.24 | 614.30 
Body Sway Lea. i 
Time Anx. | | = 
+ | 490.89 | 667.70 | 579.30 
540.62 652.97 | 596.80 
0 + 
f | | | 
0 | 77.54 | 83.96 | 80.75 
Tapping Anx. — of A. 
Ri 82.62 82.18 
{ 
70.64 83.29 | 8147 
0 + 
o | 48.450 | 49.471 | 49.005 
| 
CFF Anx. | T 
+ | 48.570 | 47.811 | 48.190 
48.555 48.641 48.598 


iety may act here to increase the scores of 
some individuals and decrease or not affect 
the scores of others, thus destroying the main 
effect. Anxiety then may be acting as a 
sensitizer to fatigue effects in some instances 
and a desensitizer in other instances. No 
significant change was observed in successive 


5109 


S. Ross, T. A. Hussman, and T. G. Andrews 


Table 3 


Analyses of Variance for the Specified Experimental Variables, Using “Within Individuals” 
as Measure of Experimental Error? 


MS for MS for MS for Body MS for et 
Source df Steadiness Body Sway Sway-Time Tapping 

Fatigue Conditions 1 30,114.67*** 2,628.12*** 908,664.34*** 937.0. a ae 
Anxiety Conditions 1 .09 333.68** 88,235.00 148.78* 4985" 
Individuals 23 3,048.33*** 341.42*** 435,366.00** 730.70*** 149. 
Interactions: R n i l 3177" 

Fat. X Anx. 1 718.83* 3.56 299,215.59* 552,78" 5 4gt** 

Fat. X Ind. 23 322.0748 64.24* 84,111.68 29.16 16s" 

Anx. X Ind. 23 301.88** 68.32** 54,416.36 94.64"** ra 

Fat. X Anx. X Ind. 23 191.63 77 48** 141,400.42*** 59.87 7 
Error: 3.41 

Within Individuals 192 143.14 34.66 56,909.71 38.17 

(replications) 
287 


* The asterisks identify the conventional levels of significance: * for .05, ** for 01, and *** for 001. 


measurements on the same individual under 
the same condition. The general conclusion 
here is that fatigue produces a general de- 
crease in steadiness. T 

Body Sway. The total mean score for all 
conditions was 30.26; for condition F-O = 
27.24, F=33:33, AO= 29.18, A= 31.35. 
Fatigue effects very significantly increase 


senifi 
body sway, and anxiety appears also te fb 
cantly to produce the same results. pere: 
vidual differences are also significant on” 
The interactions were insignificant when $ 
pared with the highest order interaction y, 
recommended by McNemar (10). No ted 
nificant effects were obtained for rep ehe 
measures under ‘the same conditions: 


Table 4 


Analyses of Variance for the Specified Experiment: 


al Variables, Using “Within Cells” 


as Measure of Experimental Error! 


for 
MS 5; 
MS for MS for MS for Body MS for “Erf 
Source df Steadiness Body Sway Sway-Time Tapping f 
Fatigue Conditions 1. 30,114.67***  2,628.12*** 908,664.34  957.03™* „yól 
Anxiety Conditions 1 + 09 333.68* 88,235.00" 148.78 16 
Replications 338.04 21.26 128,191.06#**  354.59* 
Interactions: 33.77 
Fat. X Anx. 1 718.83 3.56 209,215.50**  552.78* 7% 
Fat. X Repl. 2 3.96 32.04 45,262.12** 3.94 2A! 
Anx. X Repl. 2 216.58 60.59 20,433.86 8.53 53 
Fat. X Anx. X Repl. 2 34.17 5.59 12,374,698.76*** 61.86 
Error: F 151 
Within Cells 276 417.28 69.20 8,120.64 99.64 
(individual differences) 
287 


1! The asterisks identify the conventional levels of significance: * for .05, ** for 01, and *** for .001- 


ably too great a learning factor 


Effects of Fatigue and Anxiety 123 


general conclusion here is that fatigue and 
anxiety: both increase body sway and signifi- 
cantly more for some individuals than for 
others. The results obtained serve to cor- 
roborate other studies on steadiness and body 
sway (1, 4, 5, 6, 7, 11, 18). É 
Body Sway Time Scores. The total mean 
TA for all conditions was 596.80; for con- 
oe F-O = 540.62, F = 652.97, AO= 
ii 0, A = 579.30. Fatigue effects are very 
È iable in their action to increase these scores. 
owever, replication measures under the 
Same conditions as well as differences among 
individuals were also highly significant. Be- 
ae of these features and a very significant 
interaction effect, there was judged to 
ae ay a large amount of uncontrolled vari- 
fo £ 7 that no definite conclusions are offered 
is measure of behavior decrement. 
Tapping. The total mean score for all 
Conditions was 81.47; for condition F-0 = 
79.64, F =83.29, A-O = 80.75, A = 82.18. 
a in the case of the body sway time scores, 
rere is a large amount of uncontrolled vari- 
oy evidenced for the tapping measures. 
‘he fatigue condition acted to increase tap- 
ping reliably. However, replications within 
Pr same conditions as well as individual dif- 
erences proved significant. There is prob- 
allowed in 
the conditions of measurement of tapping. 
‘he results in general indicate that the tap- 
Ping test would be a sensitive indicator of 
ehavior decrement if the learning factor were 
etter controlled. 
ge ileal Flicker Frequency. i 
Be for all conditions was 48.598: 9: 
po edon FO = 48.885) B= 48.641, A-0 = 
br 05, A= 48.190. Fatigue alone did not 
| one any significant change, but anxiety 
‘se cts were highly significant in their de- 
anes OL CEF. Individual differences were 
ee and replication effects were not 
nificant, ; 
tep mination of the significance of the in- 
actions in the case of CFF suggests that 
€ same relationship holds between fatigue 


The total mean 
for 


an á 
. “nd CFF that obtained between anxiety and 


SE: and 
Fadliness, The interaction between anxiety 
iff Individuals is significant, indicating a 
erential effect. The interaction of fatigue 


and individuals is also present and suggests 
that some individuals change in one direction 
here while others change minimally or in the 
other direction, thus reducing the main effect 
that is predictable from fatigue. + A more im- 
portant interaction is found between the main 
effects of fatigue and anxiety. From the spe- 
cific results obtained with CFF, it appears 
that this test may be a useful one for studies 
on behavior decrement only in situations of 
individual cases. i 

Interrelationships Among the Measures. In- 
tercorrelations were obtained among the av- 
erage scores of the tests as they appeared 
under the rest or control condition. These 
Pearson correlations were obtained only on 
the 24 Ss, and it was found that only two 
such correlations were significant. These were 
the correlations between steadiness and body 
sway (r = .550) and between steadiness and 
the body sway time scores (r = .407). 

Body sway appears to have a factor in com- 
mon with steadiness, and this is possibly -the 
reason that body sway measures were found 
to be adequate indices of the stress involved 
in this investigation. It is also possible that 
the body sway test involves a factor or fac- 
tors not present in the steadiness test, because 
the former was found to be a significant indi- 
cator of anxiety effects, while this did not 


“hold for the steadiness test. 


There was one source of variation that 
was impossible to control, namely the actual 
amount of bodily damage or physical punish- 
ment sustained by each of the Ss during the . 
conditions of competitive boxing. It was felt 
that some system should be instituted that 
would allow a possible check on the validity 
of some of the experimental assumptions, and 
so correlations were computed between the 
number of head blows received and scores on 
each of the tests. The estimates of head 
blows were furnished by Mr. Frank Cronin, 
the University Boxing Coach, who observed 
every bout and tallied blows on a prear- 
ranged data form. None of the correlations 
was found to be statistically significant on a 
one-tail ¢ test, which is the appropriate test 
considering the hypothesis in this case. It 
would appear that within the limits of the 


measuring techniques and the design of the 


124 


study, the number of head blows sustained 
had little or no effect on the test scores of 
the Ss. 

As part of another investigation, to be re- 
ported elsewhere, protracted boxing experi- 
ence with its attendant number of head blows 
produced no reliably indicated changes in 
the electroencephalographic records of ama- 
teur boxers, some of whom were from among 
the Ss used in the present investigation. 

As far as the present results are concerned, 
it appears that measures of steadiness more 
than the other variables tested satisfy more 
of the criteria of reliability and predictability 
to be used as indicators of behavior decre- 
ment. Hand steadiness serves for indications 
of fatigue, and body sway which is a form 
of steadiness measure serves for indication of 
either fatigue or the type of anxiety produced 
in this study. These suggestive results may 
be taken as recommendations for further in- 
vestigations under a greater variety of stress 
conditions. 

The other variables employed in this study 
may be made into more useful measures for 
Studies of stress if their trial-to-trial varia- 
tion and very wide individual differences may 
be diminished by deriving scores through 
other techniques, reducing practice effects, 
and otherwise accounting for the larger rela- 
tive amounts of variability now classifiable 
as experimental error. 


Summary and Conclusions 


; As part of a larger research program on 
indicators of behavior decrement, this experi- 
ment investigated the comparative value of 
several selected measures of behavior decre- 
ment under conditions of fatigue and anxiety. 
The dependent variables chosen as possible 
indicators of behavior decrement were: (a) 
steadiness; (b) body sway; (c) body sway 
time score; (d) tapping rate; and (e) criti- 
cal flicker frequency. The primary purpose 
of the experiment was to determine whether 
or not performance on each of the five de- 
pendent variables changed significantly as a 
result of intensive muscular exercise (fatigue) 
or the fear of bodily injury (anxiety) or the 
interaction of these conditions in the collegi- 
ate competitive boxing situation. 


S. Ross, T. A. Hussman, and T. G. Andrews 


Twenty-four boxers were measured under 
the following four conditions: at rest; after 
heavy exercise; before fighting; and after 
fighting. The tests were administered three 
times to each subject under each of the ex- 
perimental conditions. The analysis of vari- 
ance technique was used to test the changes 
in each variable as a function of the inde- 
pendent variables. Two separate analyses 0 
the results were made: (1) using “within m 
dividuals”; and (2) using “within cells a 
the measure of experimental error. The rei 
sults permit the following major conclusions: 


1. Hand steadiness scores decreased siei 
nificantly with fatigue, but not with the an 
iety conditions. No significant change va 
observed in successive testing on the sa™ 
individual under the same conditions. 

2. Fatigue and anxiety significantly 
creased body sway scores. d to 

3. Body sway time scores were foun ns 
be unreliable, although the fatigue conditio 
significantly increased these scores. cable, 

4. Tapping was found to be unrelia D 
possibly due to a learning factor. Signi 
changes were found, however, in the | ty: 
Scores as a result of both fatigue and ne 

5. Critical flicker frequency thresho 
were shown to decrease significantly as a 
sult of anxiety. The reliability of the 
was high, and it is felt that it may be Bet 
in studies of behavior decrement in situat! 
of individual cases. 

6. No relationship was found between 
dependent variables used and the een 
head blows received by the subjects du 
a boxing bout. th 

7. Measures of steadiness more then 
other variables tested satisfy the crite" pd 
indicators of behavior decrement. gules 

i 


in- 


the 


steadiness serves as an indicator of ea 
and body sway (which is a form of He of 
ness measure) may serve as an indicat” ed 
either fatigue or the type of anxiety pro’ ples 
in the experiment. The remaining vat". pto 
tested in this experiment may be made ctf 
more useful measures for studies of the © vet 
of stress if trial-to-trial variation and an dir 
wide individual differences exhibited 4° 
minished. 


Received June 8, 1953. 


—SS—<“‘“—<C< TCS: 


. Dunlap, K. 


Effects of Fatigue and Anxiety 125 


References 


- Bousfield, W. W. The influence of fatigue upon 


tremor. J. exp. Psychol., 1932, 15, 104-107. 


. Brozek, J. and Keys, A. Flicker fusion fre- 


quency as a test of fatigue. J. indust. Hyg., 
1944, 26, 169-174. 

Improved form of steadiness tests 
and tapping plate. J. exp. Psychol., 1921, 4, 
430-433. 


. Eaton, M. T. The effect of praise, reproof and 


exercise upon muscular steadiness. J. exp. 
Educ., 1933, 2, 44-59. 


- Edwards, A, S. Effects of the loss of one hun- 


dred hours of sleep. Amer. J. Psychol., 1941, 
54, 80-91. 


- Edwards, A. S. Finger tremor and battle sounds. 


J. abnorm. soc. Psychol., 1948, 43, 396-399. 


- Kellogg, W. N. The effect of emotional excite- 


ment upon muscular steadiness. J. exp. Psy- 


chol., 1932, 15, 142-165. 


- Krasno, L. R. and Ivy, A. C. The response of 


the flicker fusion threshold to nitroglycerin 
and its potential value in the diagnosis, prog- 
nosis, and therapy of subclinical and clinical 
cardio-vascular disease. Circulation, 1950, 1, 
6, 1267-1276. 


- Malmo, R. B. and Finan, J. L. A comparative 


study of eight tests in the decompression 
chamber, Amer. J. Psychol., 1944, 57, 389. 


10. 


11. 


13. 


18. 


McNemar, Q. Psychological statistics. New York: 
John Wiley and Sons, 1949. P. 288. 

Ryan, A. H. and Warner, M. The effects of 
automobile driving on the reactions of the 
driver. Amer. J. Psychol., 1936, 48, 403-421. 


. Simonsen, E. and Enzer, E. Measurement of 


fusion frequency of flicker as a test of fatigue 
of the central nervous system: observations 
on laboratory technicians and office workers. 
J. indus. Hyg., 1941, 23, 83-89. 

Steinhaus, A. H. Boxers brains swapped for 
medals. J. of the Amer. Assn. for Health, 
Physical Ed. and Recreation, 1951, 8, 12-14, 


. Tyler, D. B. The fatigue of prolonged wakeful- 


ness. Fed. Proc., 1947, 6, 218. 


. Warren N. and Clark B. Blocking in mental 


and motor tasks during a 65-hour vigil. J. 
exp. Psychol., 1937, 21, 97-105. 


. Wells, F. L. A neglected measure of fatigue. 


Amer. J. Psychol., 1908, 19, 345-358. 


. Wells, F. L. Normal performances on the tap- 


ping test before and during practice, with spe- 
cial reference to fatigue phenomena. Amer. 
J. Psychol., 1908, 19, 437-483. 

Wulfeck, W. H. Fatigue and hours of service of 
interstate truck drivers. II. Psychomotor re- 
actions. Publ. Hlth. Bull., Washington, 1941, 
No. 265, 135-177. 


Tue JOURNAL or 


APPLIED PsycnoLocy 
Vol. 38, No. 2, 1954 


Dimensional Analysis of Motion: VII. Extent and Direction of 
: : DM Sol 
Manipulative Movements as Factors in Defining Motions 


Shelby J. Harris and Karl U. Smith 


University of Wisconsin 


In earlier investigations the problems of ex- 
tent and direction of travel movements as 
factors in determining the duration of the 
manipulative and travel components of mo- 
tion have been investigated (5, 6). Contrary 
to assumptions and observations in the fields 
of human engineering and time and motion 
«study, these studies indicate that greater 
travel distances increase the duration of both 
the travel and manipulation components of 
a motion. The same experiments also show 
that the direction of travel movement affects 
only the travel time of the motion. The pres- 
ent experiment extends this line of investiga- 
tion by studying the effects of varying the 
extent and direction of manipulation on the 


component movements of travel and manipu- 
lation in the motion pattern. 


Methods and Procedure 


The apparatus (Figure 1) used in this 
study consists of an electronic motion ana- 
lyzer which has been named the analytic re- 
actometer (3). This device is designed in 
terms of two main features: (a) control of 
the space dimensions of the motion pattern; 
and (b) separate measurement of the ma- 
nipulative and travel components of motion 
through the use of special electronic relays 
(4). The electronic methods of motion 
analysis, as adapted to the 
are based on the principle of making the hu- 
man operator a key in a circuit consisting of 
the performance situation, the Operator, an 
electronic relay and Precision time clocks. 
When the subject operates one of the switches 
of the apparatus, he activates a vacuum tube 
telay causing the manipulation-time clock to 
Tun as long as he is in contact with the switch. 
When he ceases contact with the switch, an- 
other relay is thrown, causing the travel-time 


1 This research has been supported by funds voted 
by the Legislature of the State of Wisconsin, and 
assigned by the Graduate School Research Commit- 
‘tee, The University of Wisconsin. 


present apparatus, * 


126 


clock to run. This clock is stopped and E 
manipulation-time clock started again as $00 
as another switch is touched. r a 
The planned performance situation ka 
ployed in the experiment consists of a con k 
panel, 45.7 cm. square, on which are moun Fa 
25 rotary switches. These switches K 
mounted in five rows of five switches and 
spaced by a distance of 7.6 cm. Each a if 
has 17 possible settings spaced at intervals Je- 
20 degrees. Settings of 40, 80, and 120 ees 
grees clockwise and 40, 80, and 120 deg" 
counterclockwise are marked on the dials- exe 
In the design of the experiment tht oe 
tents of manipulative movement, 40, 80; yer 
120 degrees, and two directions of such E 
ment, clockwise and counterclockwise, vA 
used, thus providing a total of six sepia 
conditions. The over-all pattern of ote 
movement was the same on all tests with ne 
subject starting at the top of the control Eai 
and working from left to right through er” 
five rows of switches. Each subject ae 
formed each of the six tests once at aP Pasive 
mately the same time on seven succes ih 
days. -All performances were carried out 
the right hand. en 
A total of 42 right-handed men and Wor’ 
students from the elementary classes 1 Ped 
chology at the University of Wisconsin 8°" 
as subjects. e Se 
In order to, control the effects of E ri 
quence and ordinal position of the six ©* ith 
mental conditions a 6 x 6 latin square yas 
seven replications of the same square a 
used. Subjects’ were assigned to a SIV" ine 
quence of tests in order of appearance p the 
experiment and were required to repe? 5: 
same sequence on seyen successive rnet 
Separate analyses of variance were pet! for 
on the travel and manipulation time dane! 
the first and seventh days of the expe op” 
only. Performances on these days a 16% 
sidered to represent unskilled and skille 


| 


Dimensional Analysis of Motion. VII s í 


ROTARY SWITCHES 


N 
~_ 


CONTROL PANEL 


TERMINAL CONTROL SWITCHES 


ELECTRIC CLOCKS 


CONTROL SWITCHES 


A SCHEMATIC DIAGRAM OF THE ANALYTIG REACTOMETER 


Fic, 1. 
the panel and the timing mechanism. 
manual control. The 120 degree extents, 
used in the experiment are not show 


cE ‘2 performance. The choice of seven days 
tee pctice was arbitrary and, therefore, the 
1 “skilled” is not intended to imply a 
“imum level of performance. 
Mee aig curves of the component move- 
a s for the various tests over the seven 
YS were also constructed. 


uring the testing procedure the subject _ 


ak eated on a chair and his height adjusted 
to in his eye level was approximately equal 
instru, top row of switches. The subject was 
rom lg to move the chair toward or away 
sition bt control panel to a comfortable po- 
front ut required to keep it centered in 
rior i the panel throughout the test session. 
tst q © each of the individual tests On the 
trig ay, the subjects were given a practice 
to ee isting of turning the first 10 switches 
Was r appropriate position. Each subject 
pr nstructed to turn the switches to the 
priate position as rapidly as possible 
tion a the ‘same time to be careful to posi- 
Brome tches accurately. ‘Although error 
ata, ae not used in the analysis of the 
age subj y Were recorded in order to discour- 
Sard g jects from becoming careless with re- 
© accuracy. 


Diagram of the analytic reactome' 


ter showing the arrangement of controls on 
The inset illustrates the design of the individual 
clockwise and counterclockwise, which were 


n on the dial. 


Results | 


Figure 2 shows the learning curves for 
travel and manipulation movements for the 
three conditions involving clockwise direc- 
tion of manipulation. Analogous curves for 
counterclockwise direction were obtained, but 
since the two sets of curves are much the 


40° MAN. 
o " 
oe ou 
0° TRAV, 
180° " 


MEAN TIME IN SEC. 


DAYS 


Learning curves for 40, 80, and 120 de- 
manipulation in a clockwise direc- 
The mean times for 42 subjects are shown 
‘or the manipulation and travel compo- 
Analogous curves for counterclock- 
ilar to those shown. 


tion. 
separately f 
nents of motion. 
wise direction are sim 


128 


Table 1 


Per Cent Change in Manipulation and Travel Time from Day 1 to Day 3, Day 3 to Day 7, and 
Day 1 to Day 7 under the Different Experimental Conditions 


Shelby J. Harris and Karl U. Smith 


Manipulation Travel 
Day 1- Day 3- Day 1- Day 3- Doy a 
Exp. Cond. Day 3 Day 7 Day 7 Day 7 A 

40 Deg. Right 16.08 8.75 23.43 1.20 5.92 m 
80 Deg. Right 21.11 8.53 27.84 83 8.00 $ si 
120 Deg. Right 20.00 10.14 28.11 1.49 10.41 a 
40 Deg. Left 16.13 7.09 22.08 = 2333) 7.53 725 
80 Deg. Left 20.23 5.80 24.86 =395 10.78 8 
120 Deg. Left 15.19 6.27 20.50 2.96 11.05 13. 


same, only those for clockwise rotation are 
shown. It is apparent that over the seven- 
day period the manipulation-time scores show 
considerably greater improvement than the 
travel-time scores. The major difference in 
the rate of improvement between the two mo- 
tion components is during the first three days 
of practice. Quite similar practice effects are 
found for the two component movements over 
the last four days of the experiment. These 
changes in performance are shown in Table 1 
in terms of per cent change from day one to 
day three, from day three to day seven, and 
from day one to day seven. An analysis of 
variance performed on the data for days one 
and seven indicates that the changes in dura- 
tion of both travel and manipulation move- 
ments over the seven-day period are signifi- 
cant at the .001 level of confidence? 

The data on the effects of direction and ex- 
tent of manipulation were first examined for 
homogeneity of variance between the differ- 
ent experimental conditions, A Bartlett chi- 
Square test for homogeneity of variance ap- 
plied to the time-score data for days one and 
Seven proved significant, thus necessitating a 
logarithmic transformation of the data. All 
of the analyses were performed on the trans- 
formed data. 

In order to evaluate the effects of the vari- 
ous sequences of tests, analyses of variance 

*The raw data and the summaries for the analysis 
of variance for this experiment are on file at the 
University of Wisconsin in the master’s thesis of 
Mr. Shelby Harris entitled “Dimensional Analysis of 


Motion: The Factors of Direction and Extent of 
Manipulative Movement in Motion.” 


tion data from the latin squares for days be 
and seven. In no instance was the sequen 
of tests a significant variable. abt 
Figures 3 and 4 show the relation beet 
the mean travel and manipulation times al > 
the extent and direction of manipulation E 
days one and seven respectively. The ne 
shown in the figures have been computed 6 
the transformed scale and then converg 
back to the original scale. As may be s 
from the graphs, the mean manipulata 
times increase considerably with inerea 
extent of manipulation for both codam 
and counterclockwise directions of mien at 
The function relating the two is appo- 
mately linear. With the exception of the af 
degree movement on day one, the mean F 
nipulation times for clockwise direction “g 


í 
were performed on the travel and manipula- 


consistently less than the comparable figu 
for counterclockwise direction. The T x 
travel times also increase with increase ut 
tents of manipulation in both directions, for 
the increase is not as pronounced as it avel 
manipulation times. Inspection of the ine 
time curves suggest that the relation bet ia 
mean performance and extent of manip 
tion also approximates linearity. yii 
Summary tables for the analyses of sed 
ance performed on the relations diaorang 
above are shown in Table 2. Extent % | of 
nipulation is significant at the .05 Jew’ tio? 
greater for both the travel and manip" pi 
components on both days one and s fot 
rection of manipulation is significant mae 
the manipulation component on day j 


ys 


ver 
| 


Dimensional. Analysis of Motion. VII 


15 
—— MAN. RIGHT 
= " LEFT 

+ TRAV. RIGHT 
3H —— " LEFT 


IN SEC. 


MEAN TIME 
N 


80 120 
MANIPULATION 


40 
EXTENT OF 
IN DEGREES 


20 Mean times for the 42 subjects for 40, 80, 
degrees extent of manipulation in clockwise 
Counterclockwise directions. The means shown 
or the first day of the experiment. 


cpt direction is significantly superior 

erclockwise direction in this instance 

a ene level. Direction does not have 

ravel FERE effect on the manipulation a 

compon mes on day one, or on the trave 

ion þ ent on day seven. None of the direc- 
Y extent interactions is significant. 


Table 2 


Sump, ; 
por of Analysis of Variance for Direction and 
Rent of Manipulation for Travel and Ma- 
nipulation Components of Motion 


Source of Manipulation Travel 
äriati 
ee af. F 
3Y One 
n tection i P p a 
hii 2 72.91" 3 14.07*** 
X teraction a E 
D, tror iis =< ae 
SY Seven 
Tectio, 
E n 1 + 1 = 
foo 2 A i 129" 
Cracti ae = 
Ero, 07 p = 2 
2 246 246 


* 
Sign; 
Kan ian acant at .05 level. 
+ Sign acant at .01 level. 
Mant at .001 level. 


129 


Summary and Conclusions 


Forty-two subjects were tested on a task 
involving repetitive switch turning under six 
different experimental conditions. These con- 
ditions consist of three extents of manipula- 
tion, 40, 80, and 120 degrees, and two direc- 
tions of manipulative movement, clockwise 
and counterclockwise. Special devices, in- 
volving electronic motion analysis techniques 
and a special planned work situation, are 
used to obtain separate measurement of the 
travel and manipulation components of mo- 
tion under controlled conditions. Each sub- 
ject performs one trial under each of the ex- 
perimental conditions on seven successive 
days. 

Learning curves for the travel and ma- 
nipulation components of motion are pre- 
sented. Analyses of variance, performed on 
the data for days one and seven, are sum- 
marized to indicate the significance of differ- 
ences in the duration of travel and manipula- 
tion movements in relation to the direction 


and extent of manipulation. 
The results of the study may be summa- 


rized as follows: 


1. Manipulation mo 
siderably greater improvemer 


vements show a con- 
nt due to practice 


a — MAN. RIGHT 
“LEFT 
«TRAY. RIGHT 
Ip " LEFT 
o 
w 
N 
z 
mP 
= 
F 
aif 
a 
W 
= 
5 


120 


80 
F MANIPULATION 
DEGREES 


Fic. 4. Mean times for the 42 subjects for 40, 80, 

E di rees extent of manipulation in clockwise 

ot 120 sterclockwise directions. The means shown 
ot the seventh day of the experiment. 


TEN o 
EX 


130 


than travel movements. This differential 
learning effect is evident primarily over the 
first three days of practice. The change in 
performance from day one to day seven is 
highly significant for both motion compo- 
nents. 

2. Duration of both manipulation and 
travel time is significantly increased with 
greater extents of manipulative movement 
and, at least for the extents investigated, the 
relations are approximately linear. 

3. Clockwise direction of manipulative 
movement is observed to be significantly 
superior to counterclockwise movements for 
the manipulation motion component on day 
seven. Direction of manipulation is not re- 
lated to duration of manipulation time on 
day one, nor were travel times affected by 
the direction of manipulation on day one or 
‘day seven. 

Previous studies (6) have shown that the 
durations of both the travel and manipulation 
movements are related to the distance of 
travel movement. Inasmuch as the observa- 
tions reported here show that increasing -the 
extent of manipulation lengthens the dura- 
tion of both manipulation and travel, it ap- 
pears that varying the extent of any of the 
components of motion will have effects on 
other component movements involved in ‘the 
pattern. Direction of travel movement has 
previously been shown to affect only the 
travel component of the motion (5). Thus, 
it appears that direction of movement, at 
least under conditions investigated thus far, 
influences only the component of motion 
within which the directional factor occurs. 
Further research to determine more precisely 
under what conditions direction becomes a 


Shelby J. Harris and Karl U. Smith 


relevant variable in determining the duration 
of movement is needed. 

The effort to systematize the study of hu- 
man motions in industry and in some phases 
of human engineering have led both psycholo- 
gists and engineers to assume that varying 
the extent of movement has no influence a 
the duration of such movement (1, 2). ae 
assumptions have been proven, through a 
mensional and component motion analysis, 
be erroneous for both manipulative and travè 
components of human manual motion. Bg i 
manipulation and travel times vary, sien j 
cantly with increasing extent of motion. of 
addition, the increase in extent of one aA 
these component movements in a comp!© 
task produces, through interaction of 
different movements, significant increase i 
duration of the other component movemen 
in the task. 


in 


Received June 18, 1953. 


References 


1. Barnes, R. M. Motion and time study (3rd Hay 
New York: John Wiley, 1949, tional 
2, Ellson, D. G. The application of opera chol- 
analysis to human motor behavior. P59 
Rev., 1949, 56, 9-17. eo ional 
3. Harris, S. J. and Smith, K. U. pimen a of 
analysis of motion. V., An analytic ‘053 
psychomotor ability. J, appl. Psychol-y 
37, 136-142. jversal 
4. Smith, K. U. and Wehrkamp, R. A. A tee per 
motion analyzer applied to psychomot®’ 
formance. Science, 1951, 113, 242-244. Dir 
5. Von Trebra, Patricia A, and Smith, K. jansfct 
mensional analysis of motion. IV. T appl 
effects and direction of movement. 
Psychol., 1952, 36, 348-353. ae 
6. Webrkamp, R. and Smith, K. U. Dimens 
anålysis of motion. II. Travel gista a. 
fects. J. appl. Psychol, 1952, 36, 201- 


«onal 
jon 
sioa 


‘ 


| 
| 
| 


| 


T: HE JOURNAL 


‘ol. 38, No. PaO aad Psycnorocy 


Discussion of Gilliland and Newman’s “The Humm- Wadsworth 
Temperament Scale as an Indicator of the ‘Problem’ 


Employee 


29:1 


D. G. Humm and Kathryn A. Humm 


Humm Personnel Consultants, Los Angeles, California 


T ; 
eters, estion to be raised is that of 
odology re e article in question. The meth- 
odology A orted does not represent the meth- 
inn anae for using the Humm- 
Praisa] E Temperament Scale in the ap- 
employment employees and applicants for 
Some such ti = the article had been given 
Com, itle as “The Integration Index 
from ia Control Measures Computed 
Cale as T Pome Wadsworth Temperament 
Dloyee® yundicators of the ‘Problem’, Ew- 
t is we would be less dissatisfied with it. 
that the implied by Gilliland and Newman 
fying id _used Humm’s procedures in classi- 
esctibed j subjects according to “risk,” as 
Men ? and in our study of Los Angeles police- 
n the as discussed in personal conference. 
imm ay, we recommend evaluating 
testeq adsworth findings for each subject 
(1) he considering each of the ‘following: 
bias, in Taw scores, corrected for response- 
Jects tage sen with the scores of the sub- 
(2) the he original standardization study; ° 
atypica] cere’ Of response-bias itself, since 
ah indicator’ onse-bias has been found to be 
‘Or; (3) or of tendencies to problem behav- 
Rents in the Positions of the seven compo- 
Dloyeq the distributions of the scores of em- 
thay co abjects, but without any implication 
card to the central tendency is 

ab Y desirable; (4) the relationship © 
Wagllilan on § EA The Humm 
e “probi s an indicator © 

em” employee. J. appl. Psychol, 1 


a 6-17) 
umm ` 
wi dsworth a G. and Humm, Kathryn A. Humm- 
D Criteria emperament Scale appraisals compared 
Patment or jab success in the Los Angeles Police 
. Psychol., 1950, 30, 63-75. 
e seven pairs of groups; rather than 
» and they were not “relatively Pure 
DAREN regression technique had $° b 
Orth y” the data. See: Humm, D. G. an 
W, Je The Humm-Wadsworth 
le. Amer. J. Psychiat., 1935, 92; 


th 


Eces, 


A R. and Newman, 
emperament Scale a: 


2 


TEBE ’ 5 
> 163 oent Son 


the Normal component to each of the other 
components (the component control meas- 
ures) and to the temperamental pattern as 
a whole (the integration index),* and (5) 
finally, the temperamental pattern as a whole, 
derived from all of the measures previously 
mentioned, and indicating which components 


"are likely to be conspicuously manifested in 


131 


the subject’s behavior and whether their 
manifestations will be desirable or undesirable 
in the situation for which the subject is being 
considered. i 

In general personnel work, we assign a risk 
rating on the basis of Humm-Wadsworth find- 
ings alone only when those findings are so 
unmistakably unfavorable as to constitute an 
insurmountable handicap even if all findings 
concerning ability should be found to be fa- 
vorable. In our report of the study of Los 
Angeles policemen, we attempted to make it 
clear that we used the Humm-Wadsworth 
findings alone, without partialling out other 
factors related to job success, because the 
policemen in question had been pre-selected 
by the civil service procedure. 

When Gilliland and Newman classified their 
subjects on a scale which they do not identify 
and which we cannot recognize for the Inte- 
gration Index and the Component Control 
Measures and then assigned risk ratings on 
the basis of Very Good for all ratings above 
five and Very Poor whenever any two ratings 
were as low as one, 


they were using a pro- 
cedure we have never 


recommended and 
strongly disapprove. 

The explanations offered by Gilliland and 
Newman for the outcome of their study seem 
to us not to follow from the data reported, 
ie.; (1) the test may not adequately meas- 


and Humm, Kathryn A. Meas- 
h from the Humm-Wadsworth 


4Humm, D. G., 
Amer. J. Psychiat, 1950, 107, 


ures of mental healt 
Temperament Scale. 
6, 442-449. 


132 


ure the components it purports to measure” 
—no data are presented to indicate whether 
or not the behavior of the subjects differed 
from the behavior that might have been pre- 
dicted from the Humm-Wadsworth results; 
“(2) these components may not be essential 
to success in this industry”—the study in- 
vestigated only a specific set of measures and 
did not do this in a way which could justify 
such a conclusion; “(3) the company cannot 
distinguish between satisfactory and unsatis- 
factory workers’”—no data are presented of 


D. G. Humm and K. A. Humm 


the procedures used by the company for de- 
termining satisfactory work or for deciding 
to discharge an employee. 

The only conclusion we are able to draw 
from this study is that it supports our ow? 
contention that over-simplified procedures atè 
inadequate for appraising workers, but that 
it offers no evidence as to the effectiveness 0 
the Humm-Wadsworth, properly used, as 0° 
of the tools for personnel appraisal. 


Received July 31, 1953. 
Published out-of-turn by the editor. 


— Á 


e 


Tue Jour 
NA 7 
Vol. 38, Rt Oe Amot PsycHoLocy 


Applied Psychology in Action 


Comment on Word Meaning 


Fred L. Wells 


Department of Hygiene, Harvard University 


of Fe ee a 1953 issue of the Journal 
a Aih Psychology, Dr. H. D. Hadley has 
achi tful note: “The Non-Directive Ap- 
I PE een Appeals.” For the text, 
aieea Si would be any interest in a 
e te eee credibility and credulity. 
erm seems the one fitting Dr. 


Hadley’s context better, but the word used is 
credibility (page 496, line 5 from end, page 
497, bottom of column one). There is ap- 
parently an “obsolete” use of credibility in 
Dr. Hadley’s sense. (See Webster.) If in- 
terpreted in the current usage, this makes 
the author’s meaning difficult to follow. . 


n 


Tue J 
0 OURN, 
Vol. 38, No Pe saree PsycnoLocy 


A Note on “The Non-Directive Approach in Adv 


ertising Appeals” * 


Mary Epstein 
George Peabody College for Teachers 


cho) December 1953 issue of J. appl. Psy- 
© point 9 an article, the aim of which was 
Ypes of some similarities between certain 
ne of brit and advertising techniques. 
hon-directiv, conclusions reached, that “the 
e infer He technique is quite comparable to 
needs w technique in advertising - - - > 
howin her elaboration. 
tention g the benefits of a product, without 
Seems, a, to sell (the inferred technique) 
the arg to some standards, superior 
fells the irect appeal, where the advertiser 
ire ee to buy the product. Often, 
e's valu appeal involves threat to the buy- 
Called je a particularly when his attention is 
Other the fact that the purchasing of an- 
leaq ont than the one advertised may 
erreg. various undesirable results. The in- 
technique minimizes the effects of 
ee emphasizing the acceptability of 
Ceptabie’ and by associating it “with very 
omp e things, persons or events.” , 
and inf arison between non-directive therapy è 
erred advertising can possibly be made 


approach in 


adley 
tig)? H. D. The non-directive 
1953, 37, 496- 


`H 
advert: 
Ver ising 4 
Ppeals, J. appl. Psychol., 


49g 

“N 

Cag, Oger: i 

Bos ought R. Client-centered therapy. Chi- 
ton-Mifilin Co., 1951. 


133 


concerning the attempt to reduce threat. The 
advisability of reducing threat, in any field 
of human endeavor, is psychologically sound. 
Further comparison, however, can only be 
drawn by doing injustice to the basic prin- 
ciples of client-centered therapy. 

A closer examination of the assumptions 
on which non-directive therapy rests reveals 
that a belief in the client’s ability to develop 
his own value system is a sine qua non of 
successful therapy. To facilitate this process, 
the therapist tries to minimize, as much as 
possible, the effects his own value system 
might have on that of the client. 

The principle of non-interference is not ap- 
plicable in the field of advertising, because, 
carried to its logical conclusion, it would 
mean that the advertiser not sell at all. 
Placed in the non-directive framework, where 
selling the client on anything is verboten, the 
advertiser would be in no better position to 
make a product look favorable in the eyes of 
the buyer than is the client-centered therapist 
in the position to steer the client’s value judg- 
ment in the direction of his own. There is a 
fundamental difference between non-directive 
therapy and advertising. The difference lies 
in the realm of commitments and intentions. 
The advertiser is committed to and intends 


134 


to sell. The therapist aims to help the client 
achieve more satisfactory adjustment, i.e., 
happiness, regardless of the values adopted 
or discarded in the process of therapy. 
Whereas there is no reason to doubt that 


Applied Psychology in Action 


some of the elements of non-directive her 
such as reduction of threat, might prove he r 
ful in raising advertising standards, an w 
qualified comparison between this type 
therapy and advertising is inappropriate- 


Tue JOURNAL oF APPLIED PSYCHOLOGY 
Vol. 38, No. 2, 1954 


The Measurement of Academic Freedom 


Willard Kerr 


Illinois Institute of Technology 


Can academic freedom be measured? 
Through most of man’s history it has existed 
in such minute quantity and excited so little 
interest as to discourage evaluation. Today, 
. despite the current wave of anti-intellectual- 

ism, academic freedom exists in a magnitude 

unknown to antiquity. But from one insti- 
tution to another, where scholars work, there 
is great variation in academic freedom. 

In 1953, the Academic Freedom Commit- 
tee, Chicago Division of the American Civil 
Liberties Union, attempted to measure aca- 
demic freedom in each of the more than 50 
institutions of higher learning in the State of 
Illinois. With the aid of other members of 
the committee and the ACLU booklet en- 
titled Academic Freedom and Academic Re- 
Sponsibility, a two-page “test” of academic 
freedom was constructed. It was called the 
“Academic Freedom Survey.” It contained 
twelve items on rights of students, seven on 
tights of teachers, and four general rights. 
Each item was answered on a three-point 
scale of “Extent to which right is effectively 
assured—complete; as a general rule; very 
little or none.” Possible scores could range 
between 23 and 69. 

Design. Approximately 200 of the ques- 

_ tionnaires were mailed to Illinois colleges ad- 
dressed to: (a) one administrator, usually 
the president; (b) one or more professors; 
and (c) one or more student leaders, usually 
the newspaper editor or student council presi- 
dent. 

Results. A total of 73 replies was received, 
and, while analysis of the data still continues, 
the obtained data do indicate substantial 
freedom variations. The most entrenched 
freedoms are: for faculty, freedom from spe- 


eon i 
cial requirements (oaths), of associatio s 
faculty organizations, of citizenship achia a 
and of research; and for students, mn the 
choice of faculty advisers. For faculty, gj- 
least secure freedoms relate to faculty free’ 
government, to tenure (security), an 
dom to criticize curriculum and aana 
tion. For students, the least securi $ and 
hear outside speakers, to criticize facu A nt 
administration, to organize association ani 
affiliate nationally, of press, of petit1o du 
of reasonable off-campus activity. defici 
These results suggest that serious th (a0 
cies exist in academic freedom for b° jatte", 
ulty and students, particularly the “pol 
While our young people are expecte ot! 
themselves ready to fight and die the ret 
country, yet we withhold from them givid, 
sonable freedoms which make for ™ a 
responsibility and character growth: 03 
statement is qualified by the fact i 
institutions maintain an admirable 5” gie 


itt 
nl 0 


dial attention in a given individual. nS de 

The Chicago Division, ACLU A“ esi 
Freedom Committee now plans regtude gh 
such surveying, but this time with pet 
“Administrator,” or “Faculty” sta itt oat 
each form in order to establish P° th i 


Union. " 


Book Reviews 


Lincoln, J.E. Incentive management. Cleve- 
land: The Lincoln Electric Company, 1951. 
Pp. 280. $1.00. 


This volume is written by the president of 
the Lincoln Electric Company, which manu- 
factures electric welding equipment. It is an 
exposition of the rationale for the system of 
centive management upon which the com- 
Pany is run. The rationale, as digested by 
the reviewer, is as follows: 


1. The primary goal of industry is to make 

a better product to be sold to more people at 
a lower price; a reasonable profit to the stock- 
holders is also important but should be a sec- 
ondary by-product. 

_ 2. This goal is possible only under condi- 
tions of free enterprise and ever increasing 
efficiency of operation. 

__ 3. Such levels of efficiency are possible only 
if Workers are motivated to develop their la- 
tent abilities, which are limitless under proper 
incentive conditions. k 

4. Workers will develop their latent abili- 


ties only if they are given a direct reward for 


eir individual contribution to production. 
5. This direct reward is obtained through 
au incentive wage system and recognition of 
he individual’s ability. 
_As can be seen from the above, the ra- 
tionale is essentially an application, in mod- 


. em industry, of the law of competitive strug- 


Ble for existence and the survival of the fit. 
ited by the knowledge of reward and recog- 
fation for demonstrated superiority, human 
€ings have limitless possibilities of improve- 


: Ment, Through “intelligent selfishness” man 


Strives on and on toward perfection and great 


Strides of progress result. x 
thi he author is thoroughly convin 7 
's rationale is true. Why is he so sure: 
€cause under his management according to 
ese beliefs, his company has become the 
Pst productive organization in the industry. 
arts and tables (in the appendix) show 
dat prices of Lincoln-made products steadily 
coulined from 1933 to 1949 while those of 
er Parable products increased. Sales va S 
Oth employee is double that of the ETTE a 
a sy industries and other companies in 
me industry. There is no union; there 
sone no work-stoppages due to laborers 
Ment disputes in any year from 1934 


ced that 


1949. Productivity increased 15% per year 


from 1934 to 1949 compared with only 3% 


per year for all manufacturing industries. The 
average Total Compensation per Employee 
was $7701 in 1950 compared with between 
$3000 and $4000 for six other well-known 
companies, some of which are competitors. 
As the author states, “The conclusion that 
must be drawn from these facts is obvious 
. . . The American economy must adopt in- 
centive management.” 

This is a very difficult book for a psycholo- 
gist to evaluate. Research-oriented, he looks 
for a statement of hypotheses, description of 
procedures designed to test the hypotheses, 
presentation and interpretation of results, 
and conclusions derived from the findings. 
In this book, however, the author presents 
merely an exposition of the “hypotheses,” 
which he presents as axioms, and his proof 
of their validity is the ultimate criterion of 
the production record of the company. Just 
how the principles are translated into opera- 
tion procedures and what the relative con- 
tribution of these procedures is to the over- 
all success of the company are not given, 
either in this volume or in a previous book 
entitled Lincoln’s Incentive System. The 
reader is left with a feeling of something 
missing and with nothing to evaluate objec- 
tively and critically. It is like a father who 
dogmatically states that proper diet results 
in healthy children and proudly points to his 
six-foot son as the proof. Can one conclude 
that diet increases height? Certainly not 
without knowledge of what diet how admin- 
istered, and of related variables such as exer- 
cise, height of parents and grandparents, etc. 

This is not necessarily to deny his thesis; 
it is just that he hasn’t proved it. In fact, 
he fundamental weakness of any 
explanation of human behavior which rests 
upon a single source of motivation, the ex- 
position of his thesis is a fairly well-reasoned, 
consistent, thought-provoking presentation. It 
is not difficult to accept his premises of the 
desirability of direct and immediate reward 
for individual effort, of the importance of 
overt recognition of individual achievement, 
of individual identification with group goals 
through stock ownership in the company, of 
the greater value of the earned security re- 


except for t 


135 


* 136 


sulting from self-confidence and assurance of 
reward than that granted by a paternalistic 
employer or government. The author has a 
supreme confidence in the ability of man to 
rise to new heights of performance and em- 
phasizes employee development rather than 
selection. The industrial progress which has 
made us a leading nation can be maintained 
only with the continuous change resulting 
from the struggle for existence under condi- 
tions of free competition. The “profit mo- 
tive” is reinstated in full force but with the 
“profit” more equitably distributed—the ma- 
jor share going to the consumer, through 
lower prices, and thereby to the workers who 
themselves are consumers. 

One wonders, however, whether the Lin- 
coln Incentive System is universally appli- 
cable. It is conceivable that it works at 
Lincoln primarily because it is unique. No- 
where in the book does Lincoln discuss selec- 
tion standards or turnover or what happens 
to those employees who do not produce at a 
high level. Assuming that they are weeded 
out or allowed to weed themselves out 
through low returns from piece-work wages, 
the company may have a highly selected work 
force—selected in terms of their responsive- 
ness to the particular type of incentive and 
the level of performance required by his sys- 
tem. One is reminded of the tremendous 
spurt to production accompanying Ford's in- 
troduction of the $5 a day wage but also of 
the levelling off which resulted as time went 
on. 

One is forced to admire the positiveness 
with which Lincoln obviously believes in the 
philosophy underlying his system and the 
courage with which he applies it. He is not 
“afraid” to pay a low-level production worker 
whatever the worker can earn under piece- 
work rates rigidly maintained. He is much 
more concerned with the benefit to the con- 
sumer than to the stockholder, who adds 
nothing to the productive effort. He has 
sincerely attempted to put his beliefs into 
actual practice without compromise and js 
thoroughly convinced of the validity of his 
beliefs. As he States in the companion vol- 
ume previously referred to, “Whatever the 
fa ad of the reader, there is no doubt 
that the incentive-management philosophy 


Book Reviews 


outlined herein is fundamental to man, 
whether he is playing a game, raising 4 
garden, or living a life.” I wish that psy- 
chologists could be as positive in their knowl- 
edge of human behavior and its application 
to life situations. ? 
Psychologists can profitably read this not 
They will feel successively annoyed, amuses, 
disturbed, provoked, and challenged. The 
price is only one dollar, an example of o 
author’s philosophy of lower prices for t 
consumer. 
Albert S. Thompson 


Teachers College, 
Columbia University 


Viteles, Morris S. Motivation and morale ™ 
industry. New York: Norton, 1953. PP: 
xvi + 510. $9.50. ng 
Any new book by Viteles is bound to con 

mand attention. As one of the first and Ty 

one of the leading industrial psycholog!S i 

his work merits and gets the attention 

workers in the field. ogy 
Viteles’ well known Industrial Psycho hat 
was first published in 1932, and since “ht 
time has been considered a classic, if not ily 
classic text in the field. Drawing hem 
upon the experience of psychologists be: 
laboratory as well as in industry, Viteles ee 

a comprehensive picture of the develop ay 

and current status of industrial psych? s 

which at that time was considerably jjew 

robust than it is today. Advocating the bi 
that the scope of industrial psychology If, 
as extensive as that of psychology tua 

Viteles nevertheless emphasized indiv! 


differences, poral’ 
In many respects, Motivation and sit! 
in Industry is a continuation of tie 


Psychology. To a considerable extent ah ê 
has repeated his earlier pattern, but W! pe 
shift in emphasis from the individual t° ing 
group. He is still interested in pe 6 
Productivity, but his frame of referen yi 
employee satisfaction and industrial or ri 
Again he has drawn heavily upon the E as 
ence of Psychologists in the laborato" ped 
well as in industry, Again he has synthe pas 
the work of other persons. Again be i 
pointed out trends and probable trends: av 

Motivation and Morale in Indust’y anf of 


vided into five parts. The first, consist? 


Book Reviews 


three chapters, is introductory in nature. It 
deals primarily with the economic man and 
the inadequacy of the concept that man can 
live by bread alone. The fifth part, consist- 
ing of four chapters, summarizes and draws 
together the remainder of the book as well 
as makes applications and recommendations. 
The remaining three parts, totaling sixteen 
chapters, comprise the bulk of the book. 
They deal with motivational theory, experi- 
mental studies, and employee attitude surveys. 
Motivation and Morale in Industry is ec- 
lectic, The bibliography refers to books and 
articles from all fields of psychology. Psy- 
choanalytical, topological, and Gestalt psy- 
i Ogy are represented as well as the more 
traditional fields. Various allied disciplines 
S also represented such as philosophy, eco- 
mics, sociology, endocrinology, medicine, 
and anthropology. Reference is made to 
Publications in various languages and to re- 
Search conducted in various countries. In- 
cluded are Canada, England. Germany, Rus- 
Sla, and the Netherlands. An extensive pe- 
eh time is covered, from William James 
tion e present. Business and trade publica- 
Todi are referred to; for example, National 
ioe Conference Board, National Asso- 
ion of Manufacturers, Factory, and Dun s$ 
PA _ Nontechnical journals and books 
te also included such as New York Times 
plagazine, Survey Graphic, Fortune. and 
“aders Digest, 
al E book is scholarly. 
and 00 scholarly to be of m hes 
recta se to the audience to whom it is di 
Cted: management in business and industry. 
bree ee is far more suitable for students 
paring for work in management. The 
Ypical business executive is impeded rather 
ton helped by phrases such as “sine qua 
Reed go vacuo,” and the like. He does not 
ie nor desire references in foreign lan- 
kang He does not readily accept the in- 
Sty] ed sentence structure or the laborious 
book Hee by Viteles. A Flesch count of this 
Medion place it far beyond the = 
Deop), e’ reading level of most managemen™ 
in = A re-write of Motivation and Morate 
„ndustry will be required if it is to gain 
© acceptance in industry. Viteles did this 
ler when Industrial Psychology was T& 


In fact it is prob- 
aximum value 


ear 


137 - 


written to fill the need for a shorter and 
simpler volume, and was published in 1934 
under the title The Science of Work. 

Viteles has written his book from a theo- 
retical and experimental viewpoint. This is 
well illustrated by his statement “Effective 
results can be achieved only through system- 
atic research conducted within a sound theo- 
retical context” (p. 66). Would that all 
workers in industrial psychology took this 
view! 

In a sense this book is too much a book of 
readings in motivation and morale in in- 
dustry. Many of the studies are weak, but 
Viteles has done an excellent service in col- 
lecting these studies in such way as to illus- 
trate the primitive status of the field. Fre- 
quently he has added his penetrating insights 
relative to such studies. Nevertheless, the 
reviewer regretted that Viteles had not taken 
a more directly critical view. It were as 
though a skilled surgeon held his scalpel to 
the skin but neglected to make a sharp and 
deep incision. Why? Does Viteles feel less 
sure of himself in the area of the group than 
in that of the individual? Do his own feel- 
ings emphasize the individual, but his intel- 
lect tell him to emphasize the group? Or, is 
he a highly tolerant man who is convinced 
that more harm than good would result from 
a more critical attitude at this time? 

The book was published prematurely in one 
respect. References were added after the 
type was set without changing numbering. 
Thus, the same number frequently appears 
successively, the second being followed by a 
letter subscript. This may be minor, but is 
apt to give some readers the impression of 
haste or carelessness which is inappropriate 
in a book of this type. It is hoped that re- 
printing will correct this defect. 

In spite of its deficiencies, this is a book 
which should be studied carefully by all who 
be interested in industrial psychol- 
ogy. It pulls together much material which 
has lacked structuralization. In so doing 
Viteles has done a valuable service albeit the 
material is primitive. Out of such syntheses 
can come considerable improvement in future 
otivational theory and experimentation. 

In writing this book, Viteles has not blindly 
d onto the band wagon of “group 


profess to 


m 


jumpe 


138 


think.” This is particularly refreshing inas- 
_ much as so many psychologists seem to dis- 
regard their own teachings and to follow the 
“all or none” hypothesis in evaluating schools, 
viewpoints, methods, and procedures in the 
field of psychology. As Viteles points out, 
“The emergence of a ‘social psychology’ does 
not require or justify the abandonment of 
“individual psychology’ in approaching or 
_ solving the problems of motivation and mo- 
rale in industry” (p. 391). 


Clifford E. Jurgensen 
Minneapolis Gas Company 


Redfield, Charles E. Communication in man- 
agement. Chicago: University of Chicago 
Press, 1953. Pp. xvit+ 290. $3.75. 
Redfield’s book presents an excellent broad 

view of the problem of communication in 

industry as well as information on how to 
handle rather specific problems. The author 
States that while the means of communication 
have now reached their greatest development, 

“intelligibility” in industrial communication 

is at its lowest stage in history. This, he 

Says, is due to: (a) the increasing size of 

modern organizations; (b) lack of training in 

wise language usage; and (c) the specializa- 
tion and segmentation of work today. Of 
the importance of communication, however, 

Redfield leaves no doubt when he quotes 

Fortune’s new motto for business, —“Com- 

municate or Founder.” ` 

- The book is arranged in five parts. The 

first part provides a general introduction to 

the problem, and contains highly useful guid- 
ing principles for effective communication. 

It is necessarily general in scope, but it does 

seem to give too little attention to one as- 

pect of communication, effectiveness as a 

function of the educational differences of 

communicator” and “communicatee.” The 
goal is stated as having members of the audi- 
ence improve their language facility (as well 
as having the communicator improve his way 
of using language). This is desirable, but 
the practical question remains whether or not 
communication can be effective if the reader 
or hearer cannot understand. It would have 
seemed worth while to present more informa- 
tion on how to make writing and reading more 
understandable, and how to check on this 
through readability formulas. Redfield, it 


Book Reviews 


should be said, does not deny the importanct ; 
of the problem, however, for he says earlier 
that “In the America of the 1950’s, literacy 
will have to be measured in terms of comp. i 
hension of transmitted ideas and concepts: 

Part II of the book takes up “communica- 
tion downward and outward,” the most Pa 
portant aspect of which is order-giving. A E 
a description of kinds of orders, Redfield Pn 
on to a discussion of oral versus re 
presentation. He then takes up indivi d- 
messages and circulars, manuals, and ret 
books. The presentation is thorough, but ned 
very thoroughness in itself leads to some ora 
eralizations that may not always be accutar 
For example, Redfield says a safe wi 
thumb in distinguishing manuals and rin 
books is that “if personal pronouns appesi P 
the text, it is a handbook and not a manu? 
This is, however, a minor point as far 45 
whole presentation is concerned. 

In Part III, Redfield presents “comi 
cation upward and inward.” He gives © 
attention to “administrative reporting” 45 p 
sential to the executive, but also takes er- 
suggestion (and complaint) systems, re 
views, and employee opinion polls. poa jn- 
all presentation is excellent, and shoul 
troduce new approaches to many readers. 

‘Part IV of the book is an interesting P™, 
entation of “horizontal communication on- 
such cross-talk as clearance, review, a4 Spd 
ferences. Horizontal communication, aS tance | 
field points out, is of increasing impo str 
because of growing specialization in in tv) 

In the final section of the book Men of 
Redfield presents his views of the fut res 
communication in management. The ion iP 
entation is largely in terms of organizat og 
management and its relation to onna uc 
tion. Recent changes in organizationa ment 
ture (reduction of number of manag@ pe 
levels) in several large corporations, an tin? 
effect on communication, provide intere 
reading. one 

All in all, the book is a valuable y 
chiefly for its survey of the field and its © g? 
plete list of references and selected re^ col 
It should prove useful to most readers, y n 
cerned with management, but particulas eg 
those who have not recognized the Me of 
communication that goes on in indus 


munk 
chie 


Book Reviews | 139 


how more effective communication can im- 


` prove industrial efficiency. 


p George Klare 
University of Ilinois 


Tyler, Leona E. The work of the counselor. 
New York: Appleton-Century-Crofts, 1953. 
Pp. 323. $3.00. 

During the past four years an unusual num- 
ber of textbooks on counseling have appeared. 
Some of the texts have been elaborations or 
developments of the nondirective point of 
view; some have been restatements of older 
Points of view modified to incorporate a 
8teater emphasis on counseling as contrasted 
with diagnosis; and one or two have been at- 
tempts at something like a synthesis. Tyler’s 
text belongs in this last category, and, in this 
Feviewer’s judgment, is outstandingly success- 
ful in this class, . 

As the title indicates, Tyler has attempted, 
Not to describe a theory of counseling, but to 
Write of the peculiar work of the counselor, 
marshalling ideas from experience and from 
Tesearch to throw light on how counseling 
May most successfully be done. It is there- 
fore an eclectic book in its approach, pre- 
dominantly nondirective in its philosophy and 
techniques, but making use of the contribu- 
tlons of testing, occupational information, and 
environmental resources in a manner more 
commonly associated with other points of 
View. Tyler makes her own synthesis of 
these approaches. The result is a very read- 
able text, suitable for relatively unsophisti- 
Cated students, in which each chapter con- 
nae with a concise critical summary of 

elevant research which makes the text ap- 
propriate for students with more background 
i. d for practitioners. ; ; 

S Che functions of the counselor in modern 

ciety are effectively dealt with in Chapter 

S thus starting out by putting the counselor s 

s ork in good social and psychological per- 

a8 Ctive. Chapter IT discusses interviewing, 

A Pies the perceptual skills of the counse a 

in reflection of feeling as a tool but pa 

aa Out that these are procedures used y : 

not p, person communicating with anot S 

is r tricks of the trade. Nondirective theo y 

dus ed here, for example, with the an 

Val n that verbal structuring is of | 
Ue, that effective structuring is behavior 


rather than verbal. Chapter III deals with 
records in a manner that is refreshing among 
texts of this type: instead of discussing the 
construction of cumulative records, Tyler 
treats them as aids to counseling, as sources 
of hypotheses to explore in counseling, as a 
means of orientation to a client rather than 
as bases for diagnosis. She conceives of the 
counselor’s province as being the client’s feel-” 
ings and attitudes, not objective facts, and 
she would leave these and the manipulation 
of the environment largely to other person- 
nel workers. 

The chapter on diagnosis therefore rec- 
ommends that counseling’ not be organized 
around this activity, as it typically is in non- 
Rogerian settings, but that diagnostic activi- 
ties be relied upon for initial screening and 
particularly as means of helping the client to 
understand himself. Data showing that clini- 
cal predictions are not valid, but that coun- 
seling with tests improves vocational decisior- 
making are cited, and ways of helping clients 
use test results are discussed in a manner 
which effectively brings together the contribu- 
tions of nondirective and diagnostic counsel- 
ing. Chapters V and VI deal with tests, 
leaving data on the construction and valida- 
tion of specific tests to other textbooks, and 
concentrating on what tests can contribute to 
the self-understanding of the client and how 
the counselor can use them for this purpose. 
The generally admitted desirability of at least 
one nondirective interview before testing so 
that problems may be aired, the more de- 
batable advantage of testing by batteries in- 
stead of giving a single test when interview- 
ing brings up the need for that kind of fact 
and other tests as other facts are needed, and 
the equally debatable value (in this reviewer's 


opinion) of written reports for clients, are 


brought out. , ay 

The chapter on occupational information 
also stresses the use of such information in 
counseling, although brief attention is paid 
to sources in passing. Thus the distinctive 
emphasis of this text is maintained, relying 
on standard texts for information on sources 
and tools and concentrating on how the coun- 
r uses them in counseling. The stress on 
occupational information which characterized 
early vocational guidance, the later rejection 
of this method by some in’ favor of testing 


selo! 


140 


and still later by others in favor of counsel- 
ing concerning attitudes, are placed in nice 
perspective (although some details of histori- 
cal explanation are incorrect as in the failure 
to recognize that early writers such as Par- 
sons also advocated self-understanding and 
counseling), and a synthesis of these ap- 
proaches and methods such as that which 
characterizes much of the best contemporary 
counseling is achieved. Occupational infor- 
mation is seen as a means of reality testing. 

Chapter VIII deals with psychotherapy, 
and Chapter IX with decision-making inter- 
views, thereby putting this text practically in 
a class by itself for comprehensiveness and 
balance in coverage. Tyler stresses the unity 
of the person and hence of counseling, laments 
false distinctions between personal and voca- 
tional counseling (still incorrectly attributed 
to the Veterans Administration), argues in 
favor of counseling which deals with voca- 
tional choice as part of the development of 
the person, and at the same time recognizes 
that people do have to make occupational de- 
cisions. In dealing with psychotherapy she 
Stresses the importance of the relationship, 
and makes the nice point that reflection of 
feeling is not so much a technique of treat- 
ment as a means of conveying to the client 
that communication js taking place. Tyler 
is appropriately modest concerning our knowl- 
edge of Psychotherapy, and points out issues 
concerning which we lack information. The 
analysis of the processes of decision-making 
and of counseling in this connection is origi- 
nal and helpful. i 

In Chapter IX the school counselor 
placed in the context of the school as one per- 
sonnel worker, with the peculiar function of 
trying not to decide things for the student. 
This is a helpful distinction between counsel- 
ing and administrative functions, but not one 
which fits the school counselor’s job as struc- 
tured in most schools, where the counselor is 
also expected to handle discipline, program- 
ming, and a variety of decision-forcing, as 
contrasted with facilitating, functions. The 
use of community resources and agencies by 
the counselor is discussed, but not in any 
detail, ý 

A chapter on the selection and training of 
counselors, and one on evaluation, bring the 


is 


Book Reviews 


book to a close. The former mentions vari- 
ous professional associations, but makes ng 
mention of the American Personnel and Guid- 
ance Association as that which, in 1950, re- 
sulted from the unification of all but one © 
those listed, follows the style of the Michi- 
gan Conference in referring to counselor Pai 
chologists instead of the more recent officially 
adopted APA term of counseling psycholo- 
gists, and fails to recognize that many on 
selors and counseling psychologists are €" 
ployed in community agencies, hospitals, a 
industrial or business concerns. It is a 
wise up-to-date and helpful, particularly 
its discussion of the self-selective function" 
of a good counselor-training program pi 
vided there has been initial screening 
academic ability. itical 

The ñnal chapter is an excellent en i 
review of evaluative studies, except for 
curious failure to note the inadequacy 
Latham’s study which results from its na 
tempt to relate test scores to occupation” | 
success after one year of work (show? 
career pattern research and longer-ter™ a to 
low-ups to be too brief and early a perio ion 
be meaningful), the even stranger omissa 
of Strong’s studies on the occupational Pi 
dictive value of tests, and the final erron® 
conclusion which Tyler therefore reache® ie 
the effect that tests have no predictive = 
for occupational success and satisfaction. pm; 
Three appendices include an intake ead” 
notes on some interviews, and selected : ith 
ings. The first two are not coordinated wand 
the text and hence have little value bet ing 
what the reader can derive from exam" of 
them himself; the last contains a numPe ise? 
helpful references, but excludes all Te tive 
on counseling other than the nondi tjai 
(e.g., Robinson, Hahn and McLean, Wi oe? 
son), surely a mistake in a text whic? "4s 
as good a job of synthesizing viewPo™ 
does this. , ma” 

A few criticisms of details and a fe efor? 
jor weaknesses should be mentioned, i 
reaching an over-all evaluation. oth! 

The apparent desire to write a SPO ye 
reading, easily digested text occasion jes 
sults in less specificity of facts than ÍS fË 
able, as in the failure to mention the “ag? 
as the USES test under discussion °” 


Book Reviews 


130. 
Fn ane furthermore, in slighting the 
and Dickson ideas, for while Roethlisberger 
oped a ce iii its as having devel- 
with ocean Meant approach simultaneously 
as stated) oa ut in 1939 rather than 1937 
mentioned to Rank and Jessie Taft are not 
other point as important precursors. Many 
text appear made and ideas expressed in the 
no indicatio as though they were Tyler’s, with 
they are a = of when they are original, when 
rary esta of the thinking of contempo- 
are novel ing psychologists, or when they 
chologists ideas first expressed by other psy- 
he contri „the literature on counseling. 
ing get i of others to Tyler’s think- 
Contributior gnition only if they are research 
tributions oe for the only theoretical con- 
itective s acknowledged are the two non- 
and “etme mentioned above and Bordin 
although bec on test selection by clients, 
er’s writing ers are clearly traceable in Ty- 
a subtitle 8. Finally, the book should have 
Pointed out In Educational Settings,” for as 
marily of a aboye it is written in terms pri- 
versity, and e counselor in a college or uni- 
Counselor a a lesser extent the high school 
counselors t disregards the fact that many 
and haa in social agency, medical, 
Not lessen th settings. This limitation does 
or techni he value of the book for theory 
of some ee but it does make its discussion 
it might lea problems less valuable than 
n this e to these other counselors. s 
the first reviewer’s judgment, this book is 
What we penine attempt at a synthesis of 
ressors ha now about counseling. Its prede- 
y individ, e described approaches developed 
Working oe or groups of psychologists 
lave nee in one setting, and hence 
ions and viased by the theoretical predilec- 
tibutors experimental limitations of the con- 
Boret Tyler has, as pointed out above, 
Unselin ical bias in favor of nondirective 
Anma one which caused her inadequately 
E ve arize and evaluate the research on 
She oo predictive value of tests. 
her ieee had limited experience in 
é is anager ei settings, which results 
tk in i na of community resources an 
“search a er settings. But she has drawn on 
S criti nd theory regardless of school and 
ically examined her own work, and 


141 


has thus achieved a breadth of viewpoint, 
variety of technique, and comiprehensiveness 
of scope which make her book unique. To 
put it in a nutshell, although it seems to this 
reviewer that the book shows an as yet in- 
complete recovery from the impact of the 
nondirectivists, it is an extremely valuable 
text which many of us active counseling psy- 
chologists would be glad to have written our- 
selves! 
Donald E. Super 


Teachers College, Columbia University 


Recommended practice for residence lighting. 
New York: Illuminating Engineering So- 
ciety, 1953. Pp. 44. $1.00. 

This pamphlet, prepared by the Committee 
on Residence Lighting of the LE.S., contains 
information useful to architects and to ap- 
plied psychologists who are concerned with 
specifying illumination which will provide an f 
attractive living space as well as comfortable 
and efficient vision in the home. Important 
developments in the field provide a basis for 
marked improvement over the first Recom- 
mended Practice of Home Lighting which ap- 
peared in 1945. The present pamphlet ‘is 
concerned mainly with basic lighting require- 
ments for family activities which involve 


close vision. 

It is gratifying 
placed upon those 
fortable vision. 
coordination of decorati 
lighting to achieve satisfactory distribution 
of illumination; (2) the maintaining of satis- 
factory brightness ratios in the field of view 
and the surroundings; (3) selection of light 
sources; and (4) lighting for specific visual 
tasks such at sewing, dining, etc. 

The numerous pictures and figures illus- 
trating types of fixtures and desirable ar- 
rangements of lighting for specific seeing 
tasks are well chosen. Limitations as well as 
uses are incorporated into much of the dis- 
cussion. Helpful materials are given in the 
appendix: detailed description of typical in- 

and of fluorescent tubes, 


candescent lamps 
luminaire classification, lighting maintenance, 


and glossary of technical terms. 
There have been two rather marked in- 
creases in the light intensities recommended 


to find a strong emphasis 
factors which promote com- 
Among these are: (1) @ 
ion (painting) with 


142 


in 1953 in comparison with those recom- 
mended in 1945: For sewing dark fabrics, 150 
from 100 footcandles; average sewing, 80 
from 40 footcandles. In several instances the 
recommended intensities are higher than can 
be justified by research findings. Except 
where casual seeing is involved, the tendency 
. is to recommend at least 40 footcandles. 
This is, in general, an excellent pamphlet 
on home lighting. The careful reader with 
a knowledge of the field can approve of all 
the material except the recommended light 
intensities. 
Miles A. Tinker 


University of Minnesota 


Bullock, Robert P. Social factors related to 
job satisfaction, a technique for the meas- 
urement of job satisfaction. Research 
Monograph Number 70. Columbus, Ohio: 
Bureau of Business Research, 1952. Pp. 
105. $2.00. 

This monograph is the report of a research 
study designed to discover the relationship of 
certain social factors to job-satisfaction and 
to employ these factors in a scale for the 
measurement of job-satisfaction. The basic 
assumption underlying the study is that the 
individual’s work behavior and adjustment 
depend upon his sentiments and attitudes. 
It is further assumed that these sentiments 
and attitudes are a result of his attempt to 
achieve personal adjustment within at least 
three separate, interacting social systems: 
the informal work group, the formal work 
organization and the larger social commu- 


nity within which the employing industry is 
located. In this study, job-satisfaction is 
considered to be an attitude resulting “from 
a balancing and summation of many specific 
likes and dislikes experienced in connection 
with the job.” 

Two measuring instruments weri 
a Job-Satisfaction Scale for use as 
and a Social-Factor Questionnaire. 
Satisfaction Scale was of the multiple answer 
type patterned closely after the Hoppock 
scales. The Social-Factor Questionnaire con- 
sisted of 129 items designed to inventory con- 
ditions on the job, in the home, in the com- 
munity and attitudes of the worker. Seventy- 
five of these were in Y, ?, N format, thirty 


e prepared, 
a criterion 
The Job- 


Book Reviews 


were Agree, ?, Disagree items. T wenty Oa 
were multiple answer questions sampling Pa 
sonal background information. (All are P 
sented for the reader’s examination in app® 
dices to the monograph. 

The istament ie pied ona gro 
of 53 male juniors and seniors in coneza ict 
of whom had held full time jobs. valida 4 
was accomplished on this group and on ia 
samples from an animal registration ass0 i. 
tion. One hundred currently employed Pe 
sons comprised the first sample and 12 tion 
employees the second. The Job-Satisfac' 


; EEL i) 
Scale was checked by testing its ability | 


e tgatis- 
differentiate between groups judged “S 


fied” and those judged “dissatisfied” tE 
ments made by a panel on the basis 0 av 
sonnel data), between individuals who ons 
“satisfied” answers to three factual aed 
from those who did not and between ita jo 
and ex-employees. This last dimen is- 
was required on the assumption that œ! 
faction might be more intense and mO"? | 
quently associated with termination O 
ployment. .) Face 
To assess the validities of the Social F all 
tor Questionnaire items, individuals renk 
three samples were divided into call 
groups on the basis of Job-Satisfactio” “in 
Scores. Each item was then evalúa wee! 
terms of the CR of the difference issa 
“Satisfied” group responses and the ™ 
fied” group responses. Objections to m tby 
stability of CR in small samples were Jes- 
requiring high CR’s in all three samP £~ pis 
The author deserves commendation pn 
adaptation of the Social Factor Qnes ie 5 
to the measurement of job-satisfact! i ent? 
for his attempt to validate his instr rea for 
All too frequently measures in this P emt” 
offered to the public. without any sy" city 
attempt at validation. Further a om 
however, is necessary before men E popie 


for he 
tion utilized was probably well chona E at 


were non-union suggest the need 
validation. Ri y, 
Personnel Research Branch, 

TAGO, Department of the Army 


judg- © 


0 
Howard L- om 


at? 


i 


| 
| 
| 
| 


New Books, Monographs, and Pamphlets 


Book: z A pe 
S, Fees and pamphlets for listing and possible review should be sent to Donald G. Paterson 
ditor, Department of Psychology, University of Minnesota, Minneapolis 14, Minnesota. * 


Class, status and power. Reinhard Bendix 


and Seymour Martin Lipset, Editors. Glen- 
Coe, Ill.: The Free Press, 1953. Pp. 732. 
$7.50. 

Science and man’s behavior. Trigant Bur- 
ee New York: Philosophical Library, 
: 3. Pp. 564. $6.00. 

k teaching-learning process. Nathaniel 
i a New York: The Dryden Press, 

Fundan Pp. 350. „$2.90. 
aaa mental psychiatry. John R. Cavanagh 
Th James B. McGoldrick. Milwaukee: 
Š e Bruce Publishing Company, 1953. 

EP: 582. $5.50. 
€ transfer value of guided learning. Rob- 
ert C. Craig. New York: Bureau of Publi- 
Cations, Teachers College, Columbia Uni- 
seed 1953, Pp. 85. $2.75. 

e role of growth hormone in carbohydrate 
metabolism. R.'C.’De Bodo and M. W. 
Sinkoff. New York: The New York Acad- 

TY of Sciences, 1953. Pp. 38. $1.00. 

sales department looks at costs. M. J. 
Ooher, Editor. New York: American 
anagement Association, 1953. Pp. 30. 


Phe ays ‘ 
he American sexual tragedy. Albert Ellis. 
bl York: Twayne Publishers, 1954. Pp. 
Sei. $4.50. 
E Perception in the university. Edgar Z: 
riedenberg and Julius A. Roth. Chicago: 


© University of Chicago Press, 1953. 
Pp. 102. $1.75, 
€ human senses. Frank A. Geldard. New 


York: oh: i F) 
: i . 365. 
$5.00 John Wiley & Sons, 1953. FP 


eee: * 
ena motor efficiency of the eyes and 
S relation to reading. Luther C. Gilbert. 
x Sees : University of California Press, 
Geo PP: 231, $1.00. a 
‘nical approach to children’s Rorschach. 
eee Halpern. New York: Grune & 
conten Inc, 19§3.. Pp. 270. $6.00. 
Men of corticosteroid action in disease 
he ae Oscar Hechter. New York: 
Pp ew York Academy of Sciences, 1953. 
192. $3.50. 


Introduction to psychology. Ernest R. Hil- 
gard. New York: Harcourt, Brace and 
Company, Inc. Text Edition, $5.75. Stu- 
dent Guide and Workbook, $1.50. . 

Sex ethics and the Kinsey reports. Seward 
Hiltner. New York: Associated Press, ` 
1953. Pp. 238. $3.00. 

Religion, science and human crises. Francis 
L. K. Hsu. New York: Grove Press, 1952. 
Pp. 142. $3.50. 

Two essays on analytical psychology. C. G. 
Jung. New York: Bollingen Foundation, 
Inc., 1953. Pp. 329. $3.75. 

A speculation in reality. Irving F. Lauks. 
New York: Philosophical Library, 1953. 
Pp. 154. $3.75. 

Adolescence. Marguerite Malm and Olis G. - 
Jamison. New York: McGraw-Hill Book 
Company, Inc., 1953. Pp. 512. $5.00. 

Men and unions. John G. Mapes. New 
York: Group Attitudes Corporation, 500 
Fifth Avenue, 1953. Pp. 36. $1.00. 

The achievement motive. David C. McClel- 
land, John W. Atkinson, Russell A. Clark, 
and Edgar L. Lowell. New York: Apple- 
ton-Century-Crofts, Inc., 1953. Pp. 424. 
$6.00.. 

And lo, the star. Margaret Aikins McGarr. - 
New York: Pageant Press, 1953. Pp. 116. 
$2.50. 

Mental health in the home. Laurence Spur- 
geon McLeod. New York: Twayne Pub- 
lishers, 1953. Pp. 243. $3.50. 

Techniques of living. William H. Mikesell. 
Harrisburg, Pa.: The Stackpole Company, 
1953. Pp. 338. $3.95. 

Psychoanalysis and personality. Joseph Nut- 
tin. New York: Sheed and Ward, Inc., 
1953. $4.00. 

Method and theory in experimental psychol- 


ogy. Charles E. Osgood. New York: Ox- 
ford University Press, 1953. Pp. 976. 
$10.00. 


Education and society. A. K. C. Ottaway. 
New York: Grove Press, 1954. Pp. 182. 


143 


144 


Personality and adjustment. William L. 
Patty and Louise Snyder Johnson. New 
York: McGraw-Hill Book Company, Inc., 
1953. Pp. 403. $4.75. 

Child psychology. Leigh Peck. Boston: D. 
C. Heath and Company, 1953. Pp. 536. 
$5.25. 

Conciliation in action. Edward Peters. New 
London, Conn.: National Foremen’s Insti- 
tute, Inc., 1953. $4.50. 

The child’s conception of number. Jean 
Piaget. New York: The Humanities Press, 
Inc., 1953. Pp. 248. $5.00. 

Shame and guilt. Gerhart Piers and Milton 
B. Singer. Springfield, Ill.: Charles C 
Thomas, Publisher, 1953. Pp. 86. $3.25. 

Adrenal cortex. Elaine P. Ralli, Editor. 
New York: Josiah Macy, Jr. Foundation, 
1953. Pp. 165. $4.00. 

Existential psychoanalysis. Jean-Paul Sartre. 
New York: Philosophical Library, 1953. 
Pp; 275. $4.75. 

The adolescent: A book of readings. Jerome 
M. Seidman, Editor. New York: The 
Dryden Press, 1953. Pp. 798. $4.50. 

Know your doctor. Leo Smollar and Neil 
Morgan. Boston: Little, Brown & Com- 
pany, 1954. Pp. 173. $3.00. 

Father relations of war-born children. Lois 
Meek Stolz. Stanford, Calif.: Stanford 
University Press, 1954. Pp. 365. $4.00. 

Saving children from delinquency. D. H. 
Stott. New York: Philosophical Library, 
1953. Pp. 266. $4.75. 

Handwriting: A personality projection. Frank 
Victor. Springfield, Ill.: Charles C Thomas, 
Publisher, 1953. Pp. 168. $1.75. 

The psychology of thinking. W. Edgar 
Vinacke. New York: McGraw-Hill Book 
Co., Inc. Pp. 370. $6.00. 


New Books, Monographs, and Pamphlets 


Cybernetics. Heinz Von Foerster, Editor 
New York: Josiah Macy, Jr. Foundation, 
1953. Pp. 184. $4.00. ’ sti 

Hypnotism: An objective study in sugges Gj 
bility. André M. Weitzenhoffer. Ne 
York: John Wiley & Sons, Inc., 1953. +P 
380. $6.00. É 

An introduction to scientific research. z 
Bright Wilson, Jr. New York: Mecra : 
Hill Book Company, Inc., 1953. Pp- 
$6.00. ache 

Driver characteristics and accidents. High 
way Research Board, Washington, D. onal 
National Academy of Sciences —Nat! 
Research Council, 1953. Pp. 54. al jion 

Report of highway safety research corne g 
conferences. Committee on Highway tional 
Research. Washington, D. C.: NeT 
Academy of Sciences—National Res 
Council, 1952. Pp. 63. con 

The field oj highway safety research. yas 
mittee on Highway Safety Research. ci 
ington, D. C.: National Academy ° 952 
ences—National Research Council, 
Pp. 42. iden" 

TES. recommended practice for re gence 
lighting. LES. Committee on Re ce, 
Lighting. New York: Publications ©, 60 
Illuminating Engineering Society: 
Broadway. Pp. 44. $1.00. ena] COM 

The Social Welfare Forum. ance co 
ference of Social Work. New gee 305 
lumbia University Press, 1953- ki 
$5.00. m 

Group report of a program of rese eseat? 
psychotherapy. Psychotherapy Colleg” 
Group, The Pennsylvania State ders Da 
State College, Pa.: William U. SOY“ g9.2° 
partment of Psychology. Pp. 177° 


— — 
— EOE 
"= es 


Journal of Applied Psychology 


VoL. 38, No. 3 


June, 1954 


P . 
ersonality Self-Assessment of Scientific and Technical Personnel 


R. H. Van Zelst 


Kroh-Wagner Company 


and 


W. A. 


Kerr 


Illinois Institute of Technology 


Png n of personality existence is 
isctebes Hehe are the selves perceived by 
cated oo and there are “selves” (compli- 
too, the ) as perceived by the self. Then, 
“preety is the “paper-and-pencil self,” the 
other pr self,” the “under stress self,” and 
tend i panelled selves. These selves 
same pe e unlike each other even for the 
ent dee ge because they exist within differ- 
es es of reference. 

Baud i reasonable hypothesis that the 
founed ns normal society who is best 1n- 
that es out an individual's personality is 
beers a individual himself. Further, it ap- 
Sandie ausible that many traits of his per- 

y can be self-assessed with substantial 


Validity (1, 4, 5) 


Method 


For at least three decades both 


Rationale. 
preoc- 


soon and psychiatry have been 
ality, ag external assessment of person- 
from thi e validity coefficients culminating 
gratifying, years of effort have been less than 
terion ng. In fact, for predicting such a Cri- 
Cally a job success they are characteristi- 
his PEA zero or non-existent (2, 4 6). 
A TR experience might now well 
òi know a researcher to ask—“Who is likely 
an a the most about a given personality? 
ast sonality assess itself?” Have we 

Con ed the obvious? 
ie all paperan 
etc to infer a trait 
ions of “eae symptoms assumed to 
e trait. Although no mea: 


encil personality 
by measurement 
be func- 
surement 


ever is direct, this approach introduces the 
obscuring influence of such intervening vari- 
ables as shaky assumptions about symptoms 
and the poor reliability of symptom-type 
items based even on relatively sound symp- 
tom assumptions. i 
The directive clinical assessment approach 
compounds the errors of the conventional test 
approach by, in effect, introducing two addi- 
tional intervening variables: (1) the person- 
ality of the clinician; and (2) the limited 
knowledge of the client possessed by the 
clinician. ; 
These limitation 
seem further to sugges 
in a self-assessment approach. The present 
study is based also on the assumptions that 
personality assessment is most valid when: 
(1) the emphasis is metric rather than im- 
pressionistic; and (2) trait concepts are 
e verbalization is minimized. 


maximized whil 
Subjects. Subjects of this study were 514 
of the 


technical and scientific personnel 
Armour Research Foundation (79%) and the 
Illinois Institute of Technology (21%). Their 
mean age was 31.9, standard deviation 9.1. 
Procedure. A self-analysis questionnaire 
was constructed on the basis of Cattell’s re- 
search (3) which lists definitive personality 
trait names based on factor and cluster analy- 
sis techniques. From this list of traits 56 


trait names were selected. 
Each subject was guaranteed anonymity 


s of traditional assessment 
t rewarding validity 


and was asked: “Please rate yourself as com- 


pared with fel 
traits utilizing t 


low scientists on the following 
he five point scale as follows: 


: “sv. Research 
sip NG COLLEGE 


hited 
= 2 ape ene ene ran eae 


146 


as compared with other scientists, I prob- 
ably am 1. much less; 2. less; 3. same; 4. 
more; 5. much more”—acquisitive, ambitious, 
etc. Each respondent also supplied age 
(nearest in five-year multiples), number of 
publications, number of inventions, and field 
of work. Of the subjects responding, 70% 
were in the feld of Engineering and 30% in 
Physical Sciences. 

Criterion. The criterion against which these 
self-assessed traits were evaluated was the 
summation of publications and inventions for 
each respondent. In other words the cri- 
terion was scientific productivity. The influ- 
ence of age was held constant by means of 
partial correlation techniques. Mean produc- 
tivity for the group was 11.4 with a standard 
deviation of 19.1. 


R. H. Van Zelst and W. A. Kerr 


Results 


Of the 667 questionnaires distributed (a 
campus mail with explanatory letters and E 
turn envelopes addressed to “Technical Per 
sonnel Research”) a total of 514 (11% 
were returned in usable form. its 

These 514 self-ratings on each of 56 tra F 
were then correlated (Pearsonian) with ‘A 
productivity (inventions plus publication 
criterion. A second series of coefficients W 
then computed using the partial metho a 
the original coefficients and holding cons", 
the effect of age. Both series are show” 
Table 1. ting 

The original hypothesis that the sell ity 
approach may yield more significant V4 al 
coefficients than the traditional at | 
evaluation approach seems to be vel 


Table 1 


Pearsonian Correlation Coefficients between 
Productivity with 


Productivity and Personality Trait and between 
Age Held Constant and Personality Trait * 


Trait r fp Trait r a 
Acquisitive — 25. =.30, Imaginative -32 “a 
Ambitious : AO 25 Impulsive J 6 
Argumentative AS 23 Independent 20 13 
Assertive -17 22 Inflexible —.10 733 
cane - = ~.20 Inhibited -2 S 

ious A 05 Interests-wide -16 j 
Constructive 06 .08 Introspective 20 A 
Contented — 44 —.57 Leading 26 “32 
Conventional —4l —.53 Love-work 5 2 i 
Cooperative ee i Optimistic 20 4 

urious 31 40 Origin 47 05 

à ginal Bl 3 
Shear -03 04 Patient —.04 = 16 
wasygoing 00 -00 Painstaking 12 ‘03 
Eccentric 04 05 Perseverin -02 Tj 
Egotistical 10 AZ Poi: e 10 “26 
Emotional .02 .03 ae "20 a 
Enthusiastic 3039 og 6 
ted sm Aa Reliable m od 
Excitable AS 20 Reserved P z 1 
Fastidious 25 30 ai ‘7 a 
Formal 25 Tse ye cae 0 2 
ae ae: Self-Controlled =N w 

i $ i ae 0 
Friendly 06 08 a 2 
Generous AG 21 Baa i 24 ws 
Grateful 07 99 eae 4 05 
Habit-bound 03 04 i o o 
Headstrong 18 oo. Ths 02 K 
Huttied i or l oughtful s ó ai 
Worrying —.20 


STs 2 a 
Ttalicized coefficients significant at 1% level. 


Personality Self-Assessment 


aa statement is based on the finding of 37 
ie enn significant at the 1 per cent 
ee ie us an additional three at the 5 per 
the Ley: This represents 74 per cent of 
isa cients at a non-chance level, a pro- 
Eisen E found in external evaluation. 
öf Ta ( O per cent) of the coefficients are 
gnitude .30 or higher. 
a more promising coefficients, ranging 
the hittin, E an interesting self-picture of 
sa 8 uy productive scientists in this study. 
e onay group these high producers 
i n themselves as being original (.61), 
(e P (— 57), not conventional 
ita imaginative (41), curious (.40), 
Also a (39), and impulsive (.39). 
Otithes ulate but to a lesser degree, 
desert a ighly productive personnel are self- 
(34). ions of self-confident (.35), leading 
a , Not worrying (— .34), not inhibited 
ao not formal (— .32), loves work 
st ), subjective (31), fastidious (.30), and 
acquisitive (— .30). 
Ee extent that these scientists are rep- 
Dëars sale the more productive scientist ap- 
anal § 9 describe himself as an enthusiastic 
a EE personality which is original, 
Str Ceara) non-conformist, not contented 
of thi eality as it is, curious aS to the nature 
tive is reality, and not fundamentally acquisi- 
Skieni a selfish sense. The less productive 
self- EE eal this study possess an opposite 
escription pattern. 
ieee pattern does not necessarily 
deed with the popular conception of the pro- 
Sas scientist. In this sample he is, for 
Bere ple, more subjective than objective m 
oiy orientation; but probably this 
ma: a for the greater introspection which 
ta, rs necessary for better and more origi- 
ie exploration and interpretation of the un- 
is Wie And our highly productive scientist 
è ot characteristically cautious or inhibited; 
ee Jess cautious, more self-confident, and 
ciate impulsive than his less productive asso- 
Tore: Nor does he lurk modestly in his 
E AA ; he engages freely in leading be- 
n results: are’ consistent with some 
fous research on a related population (7, 
oe in suggesting relative selfless- 
Daik A a as a significant trait in highly 
Ive scientists. 


he 


Summary 


A total of 514 technical and scientific per- 

sonnel of the Armour Research Foundation 
and the Illinois Institute of Technology co- 
operated in an anonymous self-administered 
self-description report on 56 definitive per- 
sonality trait names. These self-ratings were 
correlated with a criterion (inventions plus 
publications) of scientific productivity, hold- 
ing constant the effect of age by partial cor- 
relation. 
; 1. The original hypothesis that a self-rat- 
ing approach to personality evaluation may 
yield results of greater validity than ordi- 
narily found in the external evaluation ap- 
proach is not refuted and even appears to be 
somewhat substantiated. Sixty-eight per cent 
of the validity coefficients exceed chance 
magnitude at the 1 per cent level. 

2. As compared with the less productive, 
the more productive scientists in this study 
described themselves as more original, less 
contented, less conventional, more imagina- 
tive, more curious, more enthusiastic, more 
impulsive, and, somewhat less definitely, more 
self-confident, more leading, less worrying, 
less inhibited, less formal, more liking for 
work, more subjective, more fastidious, and 


less acquisitive. 
Received July 28, 1953. 


References 


C. R. and Lepley, W. M. Personal audit. 
Chicago, Illinois: Science Research Associates. 

2. Buros, O. Fourth mental measurements yearbook. 
Highland Park, New Jersey: Gryphon Press, 
1953. 

3. Cattell, R. B. Descripti 
personality. Yonkers, N 
Co., 1946. 

4. Dorcus, R. M. and Jones, M: H. Handbook of 
employee selection. New York: McGraw-Hill, 


1950. 

5, Pennington, L. A. and Berg, 
tion to clinical psychology. 
ald Press Co., 1948. 

6. Stagner, R. Psychology of personality. New 
York: McGraw-Hill, 2nd edition, 1948. 

7. Super, D. E. Appraising vocational fitness by 
means of psychological tests. New York: 


1. Adams, 


on and measurement of 
ew York: World Book 


I. A. An introduc- 
New York: Ron- 


J. abnorm. soc. 
Van Zelst, R. H. and Kerr, W. A. A further note 


on some correlat: 
productivity. J. abnorm. soc. Psychol., 1952, 


47, 82. 


Tue JOURNAL or APPLIED Psycnotocy 
Vol. 38, No. 3, 1954 


Personality Correlates of Social Conformity 


Raymond E. Bernberg 


Los Angeles State College 


The author has recently introduced a scale 
(2) which is presumed to measure social con- 
formity. Social conformity is defined as the 
tendencies of members of a society to mani- 
fest communality of attitudes and of behavior 
as a result of the restrictive influences of cul- 
ture and society in personality development. 

The scale utilizes the direction of percep- 
tion technique of attitude measurement (1): 
It is a projective-type paper-and-pencil test. 
The content of the items of the scale was 
drawn from the following determinant areas 
of social conformity: (1.) moral values; (2.) 
positive goals; (3.) reality testing; (4.) 
ability to give affection; (5.) tension level; 
and (6.) impulsivity. 

Examples of the type of items are: 


Statistics show what percentage of men like 
to write things on the walls in men’s 
rooms? 

(a) 27% 
(e) 70% 

Public Opinion Polls show what percentage 
of people feel it is silly to make close 


friendships because few people can really 
understand you? 


(a) 30% (b) 40% 
(e) 70% 


The scoring of the 37 items of the scale is 
based upon a weighted key determined by 
previous experimentation (2). Validation of 
the scale was determined by behavioral cri- 


(b) 40% (c) 53% (d) 66% 


(c) 50% (d) 60% 


Mean Scores and Sigmas of the Social Weli 
GZ Women Norms anda S 


teria (2). The criterion groups used were 
adult male and female prison inmates; yout 
male prison inmates; and regular white Prog 
estant church-going groups; other grouP 
used were college populations of all ages E. 
both sexes and police officers of the Los A 
geles Environs. M>. 

The scale is an attempt at approaching s 
dimension of personality from a different Jew 
than is usual in most personality tests. sity 
attempts to measure an aspect of persona’ e 
organization as reflected in attitudes det 
from cultural and societal influences. em 
dition, it is an indirect method of atti 
measurement. rned 

The problem with which we are conce", 
is: What relationships exist between 
scale and other direct measures of Pe 
ality? 


son” 


Procedure 
Subjects. The subjects utilized t to 
the measure of social conformity e 89 
other differing personality measures ey per 
female social welfare case workers and sareat 
visors from the Los Angeles County oadly 
of Public Assistance. They extend bri 
as to age and work experience. 5 istered 
Method. The subjects were admin! rma? 
the SC scale and the Guilford-Zimme pte" 
Temperament Survey (GZ) (4). This jte 
personality scale is a direct method a int? 
questioning. The scale is broken dow als 
ten traits which were derived by factor“ 


o relate 


Table 1 


fare Case Worker Group (N = 89) Compared to pe 
tandard Population Group on SC 


Traits sC 

G R* A or o* F T P M ji 

N M 18.2 184 134 199 192 197 16.8 17.9 18.6 n e 

rs S.D. “2 8S 49 SR oe S 56 45 56 ™ yá 

Normative M 17.0 158 137 196 155 168 15.7 18.1 17.6 10 ; 
Groups SD. of MF SS 65 Sy sa ag ap aa * 

* Signif. Di Fj i 
co ie inher 300 fer sc. oa 389 for ie erating traits 


148 


Personality Correlates of Social Conformity 


149 


Table 2 


Intercorrelation Matrix of the GZ Traits and SC 
(Pearson Product-Moment Coefficients) 


G 
> R A s E o F T P M sc 
: = .26 37 17 BR =- —14 
E i f n= = 
x 26 = -B =A 14 27 45 27 i y 7 
A B - o a 1 2 43 ago seal 
i 5 2B 02 —.19 if 1 
Fo = = 54 2 9 13 18 09 
E ‘3 a -21 18 05 —.06 93: -w -A 
6 . ü go a => 2 A 85 “a, o a f 
E r 3 = ; 29 =.19 
F zio 7 02 AS a7 = sf a 2 at, aap 
3 a 45 —.19 05 15 57 = 33 50 a —.25 
z $8 27 ‘7 =06 -35 -26 — 13 = =23 P T 
M = 43 13 23 Ad 72 50 03 = 7 B 
Sc o Coon aa g a 2 2 a = E 
10 —.19 6 =a a <4? —25 Io à =26 —.15 = 


sis 

iy ter They are: (1.) General ac- 
ance iy: (2.) Restraint (R); (3.) Ascend- 
tional st Ba, Sociability (S); (5.) Emo- 
7.) ek ility (E); (6.) Objectivity (O); 
T): ee (F); (8.) Thoughtfulness 
Mas. (s -) Personal relations (P); and (10.) 

culinity (M). 
tes Score on any trait of the GZ indi- 
the sc ‘positive” quality. A high score on 
gree of iets a socially undesirable de- 
onconformity. 


Results 


Ta ER 
Mas ee indicates the mean scores and sig- 
pared to Seep obtained in this study com- 

the ca e GZ statistics for the women (4). 
Pared na of the SC data, the sample is com- 
Populatio statistics derived from a standard 
Workers n (2). The social welfare case 

traits appear to be higher on the R, E, and 
ion o and similar to the normative popula- 

or all others. 

elation, 2 presents a matrix on the intercor- 
ereor rel of the GZ traits and SC. The in- 

is ao of the GZ traits derived from 
Variable di show a considerable amount of 
“ata (3) ifference when compared to the GZ 
Jeets ang which are based upon male sub- 
Wer: oñ are tetrachoric coefficients. How- 

ighly ig realize that both samples are 

Sible hi ee This would indicate the pos- 
Sults oe degree of probable variation in Te 
erms of gould expect of the GZ scale in 
fap differential descriptive characteristics 

m Population, 
tion Gengerelli method of “factor exhaus- 

was used to find what factors oF 


subtests of the GZ scale provide maximum 
prediction for the SC measure but no signifi- 
cant increment in prediction in addition to 
that between O and SC was obtained. 


Summary 


1. A group of 89 female social welfare case 
workers and supervisors were administered 
the Guilford-Zimmerman Temperament Sur- 
y and a social conformity scale. The pur- 


ve) 
pose of the study was to find personality cor- 


relates of social conformity. 

2. The sample obtained in this study was 
highly select and differed somewhat from a 
normative female population on the GZ scale, 
being significantly higher on restraint, emo- 
tional stability, and objectivity. The inter- 
correlations between the traits for this sam- 
ple also differed considerably from those pre- 
sented by the authors of the GZ scale. 

3. The relationship between the two scales 
appears to be limited to the — .47 correlation 
between the Objectivity factor on the GZ and 


social conformity. 
Received July 2, 1953. 
References 


1, Bernberg, R. E. The direction of pe 
nique of attitude measurement. 
Attit. Res., 1951, 5, 397-406. 

2, Bernberg, R. E. A measure of social conformity. 
J. Pers., in press. 

3, Gengerelli, J. A. A 
the factors are è 


1952, 33, 159-174. 
4, Guilford, J. P. and Zimmerman, W. S. The 


Guilford-Zimmerman temperament survey. 
(Manual of instructions and interpretations) 
Sheridan Supply Co., Beverly Hills, Calif. 


1949. 


rception tech- 
Int, J. Opin. 


method of analysis in which 
mpirical tests. J. Psychol., 


Tue JOURNAL OF APPLIED PSYCHOLOGY 
Vol. 38, No. 3, 1954 


Peer Nominations on Leadership as a Predictor of the Pass-Fail 
Criterion in Naval Air Training * 


E. P. Hollander *; 


U. S. Naval School of Aviation Medicine, Pensacola, Florida 


Studies by Williams and Leavitt (5) at the 
Marine Corps Officer Candidate School, by 
Wherry and Fryer (4) at the Signal Corps 
Officer Candidate School, and, more recently, 
by McClure, Tupes, and Dailey (3) at the 
Air Force Officer Candidate School have lent 
substantiation to the validity of peer nomina- 
tions on leadership against various perform- 
ance and operational criteria. 

Summing up their findings, based on a fac- 
tor analysis, Wherry and Fryer conclude that 
“Buddy ratings appear to be the purest meas- 
ure of ‘leadership.’ . . . Nominations by class 
appear to be better measures of the leadership 
factor than any other variable” (4, p. 157). 
Williams and Leavitt note that “. . . socio- 
metric group opinion was a more valid pre- 
dictor both of success in Officer Candidate 
School and of combat performance than sev- 
eral objective tests” (5, p. 291). They con- 
clude that the relative superiority of group 
opinion is attributable to the fact that “group 
members have more time to observe each 
other than do superior officers, they know 
each other in a realistic social context, and 
they react directly to each other’s social- 
dominance behavior. All these are condi- 
tions favorable to informed judgment” (5, 
p. 291). 

‘ In addition to adequately fulfilling condi- 
tions of validity, the nominating technique 
has been found to meet acceptable standards 
of reliability, Thus, Wherry and Fryer re- 
port that “. . . the reliability of nominations 
after four months is outstandingly higher 
than that of any of the other variables upon 
which the test was made. This is probably 

* Now in the Department of Psychology, Carnegie 
Institute of Technology. 


t Grateful acknowledgment is extended to E. R. 
Sausser, Jr. for his valuable aid in the pursuance of 
this study, 

* Opinions or conclusions contained in this report 
are those of the author. They are not to be con- 
strued as necessarily reflecting the view or the en- 
dorsement of the Navy Department. 


e nomi- 
f early 
group 


further evidence of the fact that th 
nating technique has the property 0 
identification of the members of the ee 
who constitute the two extremes of the cent 
ership distribution” (4, p. 159). In -A 
evaluation of peer ratings among * oeli- 
Corps trainees, an average reliability -a re 
cient over a two-week period of ee 

ported by Anderhalter eż al. (1, p. 26)- 


Problem «the 
The evidence supporting the validity H e 


peer nomination technique is clear-cut. ji- 
tofore, however, the criteria utilized ja 
dation have quite properly tended to K e 
rectly related to the initial characte! ~ ood 
nomination. It has been assumed, vas ersbil 
reason, that peer nominations Oon ms a oth 
should be expected to correlate wit eade” | 
terion derived from some variety of o. OON 
ship behavior or performance age Z 
the other hand, there exists very $ ect 
search regarding the applicability a e, 
nominations on leadership to perfor n 
operational criteria presumably bean th 
leadership behavior. It may well be ident" 
so-called “leadership nominations , pelat 
characteristics of the individual whig o A 
to criteria in the spheres of cognition z ii, 
sonal adjustment, or such a complex ® p 
to successfully solo an aircraft. , atio” “ 
prospect in view, the current invest i asi 
forth to explore a fundamental re Ship» G it 
that is, peer nominations on leaders; faily 
ing pre-flight school, and success pyn 
through the whole of flight training: o oe 
mentally, two questions were pose ellie. 
hominations on leadership during Biail 
correlate significantly with a pa gt? 
terion for the entire flight training G at a 
And, if so, how well do these 907", v2 
predict this criterion compared tO oa 
ables from the same stage of t% 
pre-flight? 


150 


er 
aa 


Pecr Nominations on Leadership 


Procedure 


ee y 268 Naval Aviation Cadets who en- 
taken a ight training during late 1951 were 
of nine co study sample. This group consisted 
thirty cad ore aad formed “sections” of about 
preselected s each. The cadets had already been 
of physical ye: the training program on criteria 
evel, intelligence o age, minimum educational 
ground igence, mechanical aptitude, and back- 
At  peParacteristics. 
Ose e of their third month of pre-flight 
ship oe section was administered a leader- 
vidual Papeete form which presented the indi- 
Which he et with a list of his sectionmates from 
from the hey asked to nominate the three men 
Position er best qualified for the hypothetical 
men least student commander” and the three 
ions eon s Furthermore, the instruc- 
o evaluat fically stated that the nominator was 
“present e his nominees with regard to their 
ers.” I and eventual success as military lead- 
fusion this way, it was anticipated that con- 
applied ae the “leadership standard” to be 
00, that ould be obviated. It should be noted, 
ability a cadets were directed to ignore athletic 
Was ide factor in their nominations. _ This 
art of th cred to be a necessary and desirable 
an abilit: e set in order to place some control on 
lated nee which seemed likely to be closely re- 
for Lewy Nominations were weighte 
+1 for ent +2 for second highest, and 
Say ird highest; similarly, weights of — 3, 
Tespondin a were assigned for the three cor- 
ese mine low” categories. A summation of 
his Jead ights for each cadet was then taken as 
ion of ership nomination score. The distribu- 
Boxman scores yielded a unimodal and ap- 
Scor ely symmetrical distribution. | A stand- 
à e transformation was then utilized to af- 
Standing Parable index of the cadet’s relative 
In adie leadership within his own section.” 
tive eee to this leadership score (LDR) de- 
Measure. m peer nominations, a number of other 
flight, T the cadets were available from pre- 
Scores wee were: ACE (College Level) Test 
training. tained during the cadet’s first week in 
signed st Officer-Like-Qualities score (OLQ) as- 
command the end of pre-flight by the officers ™ 
leadershi to evaluate the cadet on qualities of 
like; Rek military bearing, discipline and the 
Upon pe final pre-flight average (F.AV.) based 
tformance in all courses. 


one de- 


> Thi ` 
Velon S technique derives substantially from is 
PE 


Ci 

by, Richa of ONR Contract No. N onr- 
a Tt sho et Bellows, Henry and Company. 
achieved uld be noted that at no time were scores 
Or orities i cadets on peer nominations available to 
th Q or oar the Training Command who assign 
RN, Cadets th ormance grades to cadets. Moreover; 
Anal grape emselves did not Know the ACE scores, 
© time th: or OLQ grades of their sectionmates at 
ey made their nominations on leadership- 


151 


After a period of some eighteen 
elapsed, a follow-up of the study e 
that of the 268 cadets involved, 179 had passed 
flight training and had received their wings, 32 
had failed flight training, 28 had withdrawn from 
training voluntarily, and the balance, 29 cadets 
had been separated from the training program 
as a result of physical disqualification, illness 
violation of contract, or some similar reason. 
With criterion groups thus established, a matrix 
of intercorrelations among the predictor variables 
was constructed and biserial 7’s were computed 
for each of these variables against the pass-fail 


criterion. 
Findings 

Table 1 presents the matrix of intercorrela- 
tions, validity coefficients, and beta weights 
for the four predictor variables. Among 
these, it is apparent that final pre-flight av- 
erage (F.AV.) predicts the pass-fail criterion 
at the highest relative level and with the 
greatest weight. This tends to reinforce the 
finding of an earlier study on pre-flight 
grades as predictors of flight performance 
(2). Second to final average, however, is the 
leadership score (LDR) which the cadet re- 
ceived from the nominations made by his 
sectionmates before he entered the flight 
phase of training, and well over a year prior 
to the time he might receive his wings. It 
should be noted, too, that the magnitude of 
the difference between the validity coefficients 
for F.AV. and LDR may readily be ascribed 
to chance fluctuations. Superiors’ ratings on 
qualities related to leadership (OLQ) yield a 
validity coefficient which is positive but non- 
significant statistically; its beta weight is of 
a relatively low order as well. Scores on the 
ACE Test appear to have decidedly limited 
ictive value against the criterion, On 
validity coefficients and 
beta weights for final pre-flight average and 
peer nominations on leadership suggest that 
these two variables are of greatest relative 
validity among those considered from the pre- 
flight level of training. The multiple R ob- 
tained for these variables was calculated to 


be .33. 


pred 
the whole, then, the 


Discussion 


Considering the highly select nature of the 
population from which the sample was drawn, 
the complexity of the criterion applied, and the 
time differential between the predictor and cri- 


152 i E. P. Hollander 


) 
a Table 1 
n = -Flight 
Intercorrelations, Validity Coefficients, and Beta Weights for Four Predictor Variables from Pre-Fligl 
‘ Against a Pass-Fail Criterion from Flight Training t 
Pass-Fail Crit. Bwn 
LDR ACE OLQ F.AV. (ris) 
Peer Nominations (268) (268) (188) y 207 
on Leadership — 30** E-i a 50** 27 
ACE Test (239) (239) 089 
(College Level) — li F. K bad .07 f 
Officer-Like- GO) m 066 
Qualities Grade — .58; š 
Final Average at n 252 l 
Pre-Flight — 2 
te-Flig) N = 268 N = 239 N = 188 N= 211 p= 
*3% 12 *5% 13 "3% 15 "5% .21 i 
**1% 16 “1% 17 = **1% 20 9, 27 


r the | 
3 ae E 5 indicate 
t In each case, the correlation coefficients reported are positive. The numbers in parentheses 1n 
number of cases upon which the r is based. All validity coefficients have an N of 211. 


terion variables, the multiple of .33 takes on 
Stature. The fact, too, that under these condi- 
tions peer nominations on leadership should pre- final average, 
dict the criterion is still more surprising. While : the 
a coefficient of .27, accounting for approximately different frame of reference, is that 

7% of the variance, is not striking by itself, in to secu” 
relative terms it suggests that peer nominations and who may consequently be expecte¢ f per 
at an early level of training may account for 

unique variance in predicting the criterion, A 
number of hypotheses are entertained below in 


hat 
mewha 
kind of 


flight training. This influence may "i is pe 
an attempt to derive meaning from the obtained larly felt when a cadet is in difficulty anter i 
relationship, sented before a board of officers 19 trai ing, 
In the first place, it may be asserted as a rea- whether he is to be failed from fig by 
sonable „assumption that peer nominations on 


n leadership are apt to be social ine, Í jp 
- „Their assimilation within cadet board such as this or on the flight line, duced 
groups may be limited and their probability of a oer 
aMi favo 7 fluent & th 

be diminished correspondingly. If it is further facile ite ee te conceivabl™, 
fore, that the obtained predictive Clea term j 
nominations might be accounted for} 
some pervasive value through training 
„bass: characteristics such as these. . ade Yy 
Pepa pel to From this discussion, the points F conje? ” a 
| ss-fail criterion, vithin tw Aes itie gus 
A test of this hypothesis, by actual computation frst, fs ee rD qualit vidy 
of this coefficient, yielded an 7 of 07; the hy- defined by peer HaNninetioits subsumes mated 
Pothesis was accordingly rejected, Eik which are intrinsically grain 

This leads to the consideration th the successful completion of flight cet Od 
record of inadequate achievement at pre-flight, second that peer nominations tap 4 “reacts 
by the then potential failures, is of significance individual which is also perceived and e Pot 
in determining their leadership scores, The cor- by those who evaluate his performan? ed a 
relation of 50 between final training. These categories certainly on? 
sete 1 credence „to the conceived of as 

ption o ariable on other. 


mutually exclusive 
hypothesis is basically Whatev d to U 
A _ basically atever factors may be found £0 on 
it does not completely or Satisfactorily relationship between peer nomination 


draw criterion should yield an 
the coefficient secured with the 


at perhaps a 


Peer Nominations on Leadership 


ship and the pass-fail criterion from flight train- 
Ing, it is fundamentally true that neither variable 
is of a simple, unidimensional structure. In 
order, therefore, to distill out their commonality 
z peat psychological terms, further re- 
ee is indicated. It would appear reasonable 
a that the first step in such a direction 
ee , Be to have nominators verbalize the cri- 
leak y which they make their judgments of 
a, ership. Beyond this, it would also be de- 
irable to undertake full-scale research with a 
peer nomination form specific to the nominator’s 
pete of the nominee’s potential for success- 
ul completion of flight training. 

' In any event, it seems likely that the peer 
pomination technique may have utility far ex- 
ceeding current practice or expectation. The 
aoe judgment” of group opinion might 
ell be profitably exploited further. 


Summary 


A study was conducted to determine the re- 
lationship between peer nominations on lead- 
ership during pre-flight and a pass-fail cri- 
terion from Naval Air Training. At the end 
of three months of pre-flight training, nine 
Sections of Naval Aviation Cadets, a sample 
of 268 cases, were asked to nominate mem- 
bers of their section as best or least qualified 
for a military leadership position. Leader- 
ship scores were derived for each cadet. Three 
other scores were also obtained for the cadets 
from the pre-flight level of training: ACE 
Test; Officer-Like-Qualities grade (OLQ), as- 
Signed by officers in charge; and final over- 
all pre-flight average (F.AV-)- Biserial 7’s 
Were computed for each of these variables 
oO pass-fail criterion data from flight 
doing; Appropriate beta weights were also 

erived and a multiple R calculated. 

he findings of this study were these: 

l. Peer nominations on leadership (LDR) 
Predicted the pass-fail flight criterion at a 
Significant leyel (r= 27). 
.28) However, final pre-flig 


«7 Was of vi 1 valu 
itor: f virtually equa 


ht average (7 = 
e as a pre- 


153 


3. Neither OLQ (r= .18) or ACE Test 
(r = .07) predicted the flight criterion sig- 
nificantly. 

4, The multiple R for these four predictor 
variables against the criterion was 33. The 
beta weights obtained indicated that LDR 
and F.AV. were bearing the load of prediction. 

It was concluded that peer nominations on 
leadership, at the pre-flight level, might hold 
unique variance in predicting the pass-fail 
flight criterion. This was tentatively held to 
be attributable to the dual considerations 
that: first, peer nominations might subsume 
characteristics intrinsically related to success 
in flight training and, second, that peer nomi- 
nations might tap a facet of the individual 
which is also perceived and reacted to by 
those who evaluate performance in flight 
training. Some implications for subsequent 
research were delineated with the suggestion 
that this technique be applied further. 


Received July 24, 1953. 


References 


1, Anderhalter, O. F.. Wilkins, W. L., and Rigby, 
M. K. Peer ratings. Technical Report No. 
2. St. Louis: St. Louis University, 30 No- 
vember 1952. 

2. Hollander, E. P. An investigation of the rela- 
tionship between academic performance in pre- 
flight and success or failure in basic flight 
training. Project No. NM 001 058.17.01. 
Pensacola, Fla.: U. S. Naval School of Avia- 
tion Medicine, 24 November 1952. 

G. E., Tupes, E. C., and Dailey, J. T. 


3. McClure, ' 
Research on criteria of officer effectiveness. 
Res. Bull, 51-8. San Antonio: Human Re- 


sources Research Center, Lackland Air Force 


Base, May 1951. l 
4. Wherry, R. J. and Fryer, D. H. Buddy ratings: 
popularity contest or leadership criterion ? 
Personnel Psychol., 1949, 2, 147-159. j 
Williams, S. B. and Leavitt, H. J. Group opin- 
i jon as a predictor of military leadership. J. 
consult. Psychol., 1947, 11, 283-291. 


on 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 3, 1954 


The Retest Consistency of Army Alpha after Thirty Years * 


William A. Owens, Jr. 
The Iowa State College 


Personnel decisions are made every day 
which imply the long-term consistency of re- 
sults obtained from group tests of intelligence. 
It is, however, relatively rare to be able to re- 
test a group of adults with the identical meas- 
uring instrument originally employed and 
over a period of time equal to only a little 
less than one-half the average life span. The 
present paper is therefore devoted to a brief 
description of some results obtained under 
these conditions. 


Table 1 


Test-Retest Correlations of Army Alpha Subtests and 
Total Score after Thirty Years for 127 Iowa 
State College Freshmen Men 


Subtests 


Fit To-c Th 

1. Following directions -30 AD = 
2. Arithmetical reasoning 69 JI = 
3. Practical judgment .56 .93 — 
4. Opposites 64 93 -63 
5. Disarranged sentences AS 87 62 
6. Number series completion _ 62 -76 = 
7. Analogies .56 96 84 
8. Information -63 13 69 
Total Score Ah 97 =_ 


The data were gathered in connection with 
an investigation of the effects of age upon 
mental abilities; this research has been re- 
ported in detail elsewhere by Owens (2). In 
this context, 127 males of mean age nineteen 
years who had originally taken Army Alpha, 


*The basic data u 
based 


Form 6, as freshmen at Iowa State Colle GH 
during early 1919 were retested with identica 
copies of this same examination during 1950: 
The results of this testing and rete 
have been incorporated in the two ta ns 
which follow. Table 1 shows the correlati 
for each subtest. Table 2 shows a condesa 
scatter table for Total Score. Taken tog? cy: 
they indicate rather remarkable consisten 
since the range of talent in a college POP" iy 
tion largely composed of graduates * is soba 
greatly restricted, and since the basic hi rty- 
previously mentioned suggests that the t i ti 
year age increment involved did affect ! e 
viduals differentially. In each instance 
retest coefficients? (r+) may be comp e 
with corrected odd-even coefficients E 
1919 testing (ry); and, since these latt g in 
considerably inflated by undue speedin ‘wel 
at least three instances, Gulliksen a als? 
limit estimates (formula 24) (111) 4° 


included where appropriate. 1d see 
If a conclusion is warranted, it wou upo” 
to be that personnel decisions posite -ned 


the long-term consistency of results aes 
from our better intellective tests are ™ 
ably well founded. 

Received June 26, 1953. 


References ew! 


1. Gulliksen, H. Theory of mental bests: 
York: Wiley, 1950, Pp. 236-238. 

2. Owens, W. A., Jr. Age and mental 
longitudinal study. Genet. Psycho! 
1953, 48, 3-54. 


ee: 
sities 

ilitie” y, 

ip onos" 


nig” 


} 
grant from The Office of Naval Research; i peyond 
e ee and assertions contained herdis Re wer nS ee a ee nt co 
oi me a are not to be construed as official 2 All stated magnitudes are product-mom’y co 
the Raval Tie Kae the Navy Department or relations Bamet from normalized standa ed 

BE and not from the grouped raw scores aS p 
J Table 2 
1919 vs. 1950 Army Alpha Scores: Total Score for 127 Towa State College Freshmen Me? 

1919 1950 Score ott! 
Score 50-74 75-99 100-124 125-149 150-174 175-200 4 
175-200 3h 
150-174 j E | Š 8 

S2 4 5 y 
125-149 i [ia | 7 15 30 
100-124 i [r = . 6 
75-99 = a 19 7 7 
50-74 a 1 A 

12 

sual g 5 14 38 48 22 


ka O llCll lll 
——————“—*CSCCCC e 
SS ee 


\ 


THE JOURNAL or A : 7. j 
Vol. 38, No. 3, ae PSYCHOLOGY 


Reliability and Validity of the Kopas Personnel Test Battery 


Philip Ash 
Inland Steel Company, East Chicago, Indiana 


in os Kopas Personnel Test Battery (3) 
Cludes seven tests which purport to meas- 
TE (A) Ability to Think in Mechanical 
erms; (B) Knowledge of Math and Science; 
(C) Preference for Non-Routine Work; (D) 
patty to Get Along with People; (E) Emo- 
ional Stability; (F) Ambition; and (X) 
ae Coordination, A novel characteristic 
Testy tests (except the Manual Coordination 
E is their method of administration and 
e The test questions are printed on 
= S mounted on panels. The answers to 
ach question are printed around a dial, and 
e examinee indicates his choice by turning 
a pointer to the proper point of the dial. 
he tests are scored by noting the position 
of the pointers on the back of the board 
Where the correct dial positions are marked. 
pe device eliminates answer forms and 
Pencil and paper” administration. Tt also 
ae to provide a permament record of item 
ce Ponses. Furthermore, the equipment ne- 
Ssitates individual administration. 
Some of these tests are novel in intention 
a definitions, although it is not clear that 
items in them (e.g., Ambition, Emotional 
ability) measure what the names claim 
ey measure. 
W Neither the Test Manual (3) nor a mimeo- 
ined bulletin by the author on pecomei 
Ne ing (4) includes any reliability data, a 
tY sparse validity information. Only one 
published study, on the test’s reliability (1), 
ems to have appeared. In it, Baxter re- 
a rted that “correspondence with companies 
c Ported as users of the tests revealed little 
Onclusive data on validity and no data on 
reliability,» 
ee bulletin on the tests (4) suggests that 
Original validation was based on a sample 
goo employed workers, but no validity c0- 
Clients are offered. 
is paper summarizes several r 


es and ate t 
Validity, one attempt to evaluate 


eliability 


audi he test’s 


Reliability 


Baxter (1) calculated both split-half and 
Kuder-Richardson reliabilities for a sample 
of 100 applicants for hourly positions at the 
Owens-Corning Fiberglas Corporation. Chew 
(2) calculated split-half coefficients for a 
sample of 249 male applicants of a steel mill. 
On a sample of steel mill applicants (Inland 
Steel Company) who were hired, test-retest 
coefficients were computed. For this sample, 
the retest interval was three months. 

The results of these studies are summarized 
in Table 1. In general these results are 
fairly consistent, and suggest that the tests 
are not reliable enough for individual predic- 
tion. No reliability study is reported for the 
Manual Coordination test. 


Validity 


The battery, including the Manual Coordi- 
on Test, was given to a sample of 88 em- 
ployed plant protection officers for whom su- 
pervisory ratings were collected. For the rat- 
ings, the plant protection officers were divided 
into six groups: the top best sixth of the 
force, the second sixth, and so on down to 
the bottom (worst) sixth. 

The score for test X, Manual Coordination, 
is originally a time score, ranging from about 
two to six minutes. Since time scores are not 
readily amenable to treatment as moment 


nati 


Table 1 


Coeficients of Reliability for Kopas Subtests 


Inland 

Baxter Baxter Chew Steel 
Split-Half Kuder- Split-Half Test- 

Test (Corrected) Richardson (Corrected) Retest 
A 59 69 66 40 
B At -16 .88 16 
C 87 86 93 31 
D 59 58 64 65 
E .58 69 12 O 
F 8S 82 91 87 

155 


Philip Ash 


Table 2 


Means, Standard Deviations, and Intercorrelations of the Subtests of the 
$ i Kopas Battery and Supervisory Ratings 
(Sample: 88 Plant Protection Officers) 


Tests E 
Test A Cc D E F x 
A. Mechanical — 
B. Math and Science -60 — 
C. Ability to Get Along with People 24 35 — 
D. Preference for Non-Routine Work —.07 28 16 — 
E. Emotional Stability —.05 =.15 —.30 05 — 
F. Ambition AS 02 23 00 .01 a = 
X. Manual Coordination .23 .09 120 =0 -0 -0 a2 = 
I. Supervisory Ratings .20 28 26 25 —.15 10 2 Pi 
Mean 15.1 148 175 225 481 438 E 12 
S.D. 5.0 10. 


11.8 6.5 6.0 9.2 


statistics, these scores were converted to 
speed scores: 


Speed Score = ee, 
time score 


The intercorrelation matrix is reported in 
Table 2. Considered as a battery, the test 
intercorrelations are satisfactorily low. With 
the exception of the correlation of .6 between 
tests A (Mechanical) and B (Math and Sci- 
ence), these intercorrelations are generally 
not significantly greater than zero. 

However, the criterion correlations are all 
equally low. A multiple 


The conclusion 
battery does not succes 


of some of the tests 
bition” as a function o 
leisure-time interests) 
necessary. The uncer 
tests in the battery, 
they hold litt] 
tools, 


Summary 


+ ternd 

1. In two independent samples, the oe 
Consistency reliabilities of the six “rand 
formance tests in the Kopas Battery test” 
from .58 to .93. In another sample; 1 
retest reliabilities ranged from .40 tO no 

2. For a sample of 88 plant pre wir 
cers, the seven tests in the battery ations 
correlated with one another, and corre gible 
with Supervisory Ratings were nes 
The multiple correlation was .348. for cat” 

3. These results indicate the need be used: 
ful validation, if the tests are e on on 
They suggest, in view of the a net 
test reliability, that the battery amplo 
Prove to be of significant value in € 
selection. 


Received June 29, 1953, 


References zop% 


Kee 
i he 
1. Baxter, B. Reliability and validity Ë pi 
Wage Earner battery of tests. 7. ol! 
chol., 1947, 31, 39-43. al itemy w 
2. Chew, W. B. Internal consistency inisle as” 
the Kopas Personnel Tests as adi ishe 
applicants of a steel mill. Uep ae 1951 ont 
ter’s thesis, Purdue University, Ju Manas" 
- Kopas Personnel Tests. Cleveland: sl 
Personnel Corporation, 1946. iĝ oj! 
4. Kopas, J. S. The development and 15 grap 


w 


Saai 1 
modern personnel programs (mim F om 
1942. occult 
5. Stead, W. H., Shartle, C. L., et al. Oy 


counseling techniques. New or 
Book Company, 1940, (Append? 


| 
1 


THE JOURNAL OF AP 
mai PLIED Psy sY 
Vol. 38, No. 3, 1954 YCHOLOGY 


Response Reliability of the Activity Vector Analysis 


James N. Mosel 


The George Washington University 


5 Psi Activity Vector Analysis (AVA) * is 
hore personality appraisal instrument which 
fees a to be gaining wide popularity in in- 
has ria personnel circles. Although the test 
received notice in business and industrial 
Pallipatiois, there have been no reports in 
if green psychological literature on 
ak or validity. The only study of 
Dorë the writer is aware is an abstract by 
data us and Jones (1, p. 402) of unpublished 
ons on machinists received from the test's 
Š Sinator, W. V. Clarke. These data showed 
Some validity against the criterion of super- 
visors’ ratings. 
iy test consists of 81 descriptive adjec- 
aves, such as “easy-going,” “high-spirited,” 
“Impulsive.” The testee is requested to check 
a items which have ever been used by any- 
list A describing him, and then from the same 
SH all items which he believes are truly de- 
an ptive of himself. He draws a line through 
mi item which he does not understand. (For 
ch venience of reference, the two sets of 
checked items will be referred to as the 
Other” and “self” choices.) 
we scoring and interpretation of the test 
ad be learned only by undergoing training 
su ee by the test’s originator. The re- 
pr S are presented in the form of a summary 
"A le of six scores (an activity score and five 
on scores), accompanied by an evaluative 
m covering the following topics: sum- 
ieee of major characteristics, environmental 
tact tions, work requirements, social con- 
tial S, supervision required, accident poten- 
ci, and a final over-all comment. This in- 
Geo" is then related to the needs of a 
Viously analyzed job. 
tin a test of this type there are two : 
ot bel reliability: the test-retest consistency 
talaba (“response reliability”) and the 
is men ity of the interpreters judgments. It 
ent that without adequate response Te 


prob- 


1 
Copyri 
OPyright, 1950, by Walter V. Clarke. 


‘cient for the entire group, 


157 


liability there can be no interpreter reliability 
nor can the interpretations have validity. i 

The present paper reports some preliminary 
evidence on the question of response reli- 
ability. Since the method of scoring the 
AVA is not available in open source, the re- 
liability of the responses rather than the 
scores was the object of study. 


Procedure 


Fifty-two employed adults in evening classes 
of a university were administered the AVA 
on two occasions, separated by an interval of 
about two weeks. Instructions at the first 
administration were designed to conceal the 
fact that the test would be given a second 
time. 

As a measure of retest consistency, the 
common elements or overlap coefficient of 
correlation (2, pp. 120-123) was computed 
for each individual.” This coefficient is a 
measure of the extent to which an individual 
selects the same items on both trials; it is 
essentially the proportion of overlap between 
his first and second sets of choices. A simi- 
lar technique has been used by Zubin (3) in 
measuring the similarity between two indi- 
viduals on a check list. A value of 1 means 
that the same items were selected on both 
trials; zero means that there were no items 


common to both trials. 
Results 


Table 1 shows the distribution of overlap 
coefficients for the “other” and “self” choices. 
There are large individual differences in con- 
sistency, the exact ranges being from .28 to 
98 for the “other” choices and from .35 to 


94 for the “self.” 


As an approximation of a reliability coeffi- 


the mean overlap 


niz 


vin 


ber of items chosen the first trial, nə the number chosen 


the second trial, and m2 the number common to both 
trials. 


2 The formula is: 712 = , Where m is the num- 


158 


coefficient was computed. The resulting values 
for the “other” and “self” choices were almost 
identical, .74 and .73. To the extent that 
these values can be considered as conven- 
tional reliability coefficients, they border on 
respectability; but the effect is reduced by 
the fact that for both sets of choices the ma- 
jority of all overlap coefficients were less than 
-80. 

It was found that there is only a moder- 
ate relationship between consistency on the 
“other” and “self” choices, the Pearson cor- 
relation between the two sets of overlap meas- 
ures being .57. 

Table 2 shows the means and variabilities 
of the number of items chosen on the first 
and second trials. It will be noticed that 
the mean number of items chosen increases 
slightly for both sets of choices on the second 
trial. Also, fewer items are chosen for the 
“self” choice than for the “other” choice; the 
variabilities are correspondingly smaller. 


Summary 


The AVA, an industrial personality test, 
requires the testee to select from 81 descrip- 
tive adjectives all those which anyone has 
ever used to describe him (“other” choices) 
and all those which he believes are truly de- 
scriptive of himself (“self” choices). The 
test-retest reliability of these choices was de- 
termined by means of the common elements 


Table 1 


Distribution of Overlap Correlation Coefficients 
between First and Second Trials 


Frequency 
Intervals of ry “Other” “Self” 

-90-.99 7 5 
-80-.89 15 18 
10-79 10 14 
.60-.69 10 7 
.50-.59 5 3 
40-.49 2 3 
30-.39 2 2 
-20-,29 1 0 

Total 52 52 


James N. Mosel 


Table 2 


Means and Standard Deviations of the Number of AVA. 
Items Chosen on the First and Second Trials 


First Trial 
“Other” “Sell” 
Mean 34.6 Be 
SD 17.0 13; 
Second Trial 
“Other” “self” 
Mean 37.8 aud 
SD 17.8 i 


correlation coefficient (proportion of overlap) 
for each of 52 individuals. 

There was considerable individual T 
bility in consistency. While the mean noice’ 
cients of overlap for the two sets of ou T 
were .74 and .73, in both sets of choices 80. 
half of the coefficients were less than adi 
Consistency on one set of choices, 45 erlaP 
cated by the correlation between the oy was 
Coefficients of the two sets of choice he 
not closely associated with consistency °” 
other (7 = .57), ine 

It would ug from these results t no ed 
terpretations of the AVA might be dist jabe 
by instability of response in an app Te anc? 
number of cases, the amount of dist”. yen 
depending on how much is staked 00 oan i 
item in interpretation., There is the itsel! 
bility, of course, that retest consistency yity 
might prove a useful indicator of pets” pê 
but this is not taken advantage ° 
present method of utilization. 


ria 


Received July 17, 1953. 


References path /! 

1. Dorcus, R. and Jones, Margaret. Hango” 

employee selection, New York: y0 

Hill, 1950. yistical Fos 
2. Peters, C. and Van Voorhis, W. Sr } 

cedures and their mathematical Fo 

York: McGraw-Hill, 1940. y jikemi” 331 
3. Zubin, J. A technique for measuring 

ness. J. abnorm. soc. Psychols 

508-516. 


THE Journ 
Vol. 38, Noo 3. 1 feta Psycnorocy 


p cialization L a. n 
€ el Scale for the Stron, V i lI 
I he S e Vi gee ocation. 
terest 


Milton G. Holmen 
AFF Human Research Unit No. 2, Fort Ord, California? 


ce fi describing the development of the 
esl level scales for the Strong Vo- 
talist Pa ki Blank and the Medical Spe- 
a Aedini, erence Blank for use in planning 
fs recent career was made by this writer in 
(2, pp. flee by Strong and Tucker 
paper is tod ). The purpose of the present 
Possibility ais an investigation into the 
scale bee the using the specialization level 
Of mén wh e Strong blank in the counseling 
Th oe. oare not planning a medical career. 
might be Pa conceived that a scale 
job satici eveloped for use in prediction of 
ich hha aap in the medical specialties. 
Weights hogy was developed by assigning 
which a those items on the Strong blank 
inses ae differences between the re- 
ec 2 medical specialists as a group and 
this he, mses of physicians-in-general. Since 
group aF subtracted” the interests of one 
Btoup of medical men from those of another 
medical medical men, it was expected that 
caving interest would be subtracted out, 
in a] scale which might measure interest 
edi alization in non-medical as well as in 
‘ae areas. 
Sider = be of interest at this point to con- 
ent wei it kinds of items are assigned differ- 
ia bites ts in the Specialization Level scale. 
of rice: a few items, such as the occupations 
plu oes and author of a technical book, 
S-weight is assigned to liking the item 
č a minus-weight assigned to being indif- 
siy ti k at Stanford Univer- 
rolect, ae of the Medical Specialists Research 
© Surgeon a oncract No, W-49-007-MD-483 wit 
eneral, U. S. Army. The opinions €x- 


Drege, 
ed j è 
not Ed in this paper are those of the writer and do 


0} 
ne ¢ 
the Anny uy reflect those of the Department of 


is 
Tesearch was conducted 


This ‘eri, 
(1), study was a part of a doctoral dissertation 
eciation to 


Prof e writ i 

D er wishes to express appr 

ior aaa W. Taylor and Cal. ‘Anthony C. Tucker 
sward nce in conduct of the study, and to Dr. 
“rch and Strong, Jr. for suggestions on the re- 
Piplisrig a: of data from his files. 
Ca, S a division of the Human 


rch 
Office, The George Washington 


Resources Re- 
University. 


ferent or disliking. On others, 

occupations of bookkeeper, ie A 
and bank teller, disliking the item is a 
signed a plus-weight, with minus-weights as- 
signed to indifference or liking. Items in 
which the indifferent response is not weighted 
are quite common, such as the occupation of 
certified public accountant and the feeling to- 
ward pet canaries for which plus-weights are 
assigned to disliking and minus-weights to 
liking. Liking of social problem movies is 
assigned a plus-weight; disliking them is as- 
signed a minus-weight. There are a few 
items, such as “chopping wood” and. “pet 
monkeys,” for which only the indifferent re- 
sponse is weighted. Poetry, smokers, and the 
study of agriculture are assigned minus- 
weights for liking, plus-weights for indiffer- 
ence, and no weight for disliking. On several 
such as the occupations of music 
ICA worker and the activity 
anical puzzles, plus-weights 
f indifferent and 


items, 
teacher and YM 
of solving mech 
are assigned to the response O 
minus-weights to disliking. 


Results 


ip between Specialization Level 


Relationsh 
evel. Are scores on 


Scores and Educational L 
the specialization level sca 
of education? Such a relationship might be 
expected since high scores on the scale are 
obtained more often by persons engaged in 
occupations for which a considerable amount 
of specialized training is required. Mean 
scores on the scale were obtained for mem- 
bers of fourteen occupational groups. These 
means were obtained by use of the method 
recently reported by Strong (5) which pro- 
vides mean scores on any scale for any group 
from the summary data on the responses of 
that group to each item on the Strong blank. 

These fourteen occupational groups made 
up four subject-matter clusters. It was pre- 
dicted that within each cluster the specializa- 


le related to amount 


159 


160 


Table 1 


Relationship between Mean Specialization Level 
Scores and Mean Educational Levels 


Mean Mean 
Speciali- Educa- 
zation tional 
Level Level 
Score (years) 
Medical Group 
Medical specialist 50.0 20* 
Physician 39.5 19* 
Osteopath 34.9 17.0 
Dentist 32.8 14.9 
Social Science Group 
Psychologist 54.8 19.0 
Social science teacher 41.8 16.4 
Accounting Group 
C.P. A, 43.9 14.3 
Accountant 39.6 12.3 
Office worker 35.9 11.5 
Physical Science Group 
Mathematician 48.8 18.8 
Physicist 48.8 18.5 
Chemist 45.5 16.8 
Math-science teacher 42.1 16.4 
Engineer 41.8 15.4 


* Educational level of group estimated. 


tion level mean scores would correlate posi- 
tively with the mean educational levels. The 
correctness of this prediction is indicated by 
the fact that, within each of the subject- 
matter clusters, the mean specialization scores 
were arrayed in the same order as the mean 
educational levels. The educational levels 
and specialization level scores of these groups 
are presented in Table 1. 
The groups used for the comparisons pre- 
sented in Table 1 were those used for the con- 
struction of occupational scales on the Strong 
blank (4, pp. 694-717). Higher education- 
level means would undoubtedly be obtained 
from present members of these occupational 
groups, but the general trend would probably 
vary little from that indicated in Table I. 
The relatively low scores of dentists and 
engineers may be taken to indicate that the 
scales measure a kind of specialization em- 
phasizing theoretical rather than technical 
considerations. Study in the occupations in 
which the highest scores were recorded ordi- 
narily involves more theoretical work than 


Milton G. Holmen 


those in which lower scores were obtai i 
Though the evidence on this point is far | ‘an 
conclusive, the data suggest an hypothesis 
further investigation. . 2 
The mean Po of psychologists is of ae, 
ticular interest, since it was the highest f the 
score obtained, even higher than that 0 
medical specialists. An objection ee 
made to considering psychologists and att 
science teachers as working in the ee er 
ject-matter area, but the social science ae on 
group was the only one at all approp"! com- 
which data were available for making @ 
arison. 
i Specialization Level Scores and Sucar 
Graduate School. The data above between 
that a positive relationship exists of edu 
specialization level scores and amount ait 3 
cation, but do not provide as premi e de 
of what the scale measures as woul pe f 
sirable. The specialization level pe were 
subgroups within two other occupatio”? ice 
therefore obtained to provide a more 
estimate of what the scale aging bi 
Strong blanks (1927 edition) W dent 
tained from two groups of former pai es 
of the Stanford Graduate School of the 
These blanks had been administeré pool- 
students during their first year in eat 
Seventy-five of the men who filled 0V p of 
blanks were later awarded the aa B.A) 
Master of Business Administration > iled P 
The other 75 had dropped out oF DS wer 
fore getting the M.B.A. The err tere 
matched by year to equalize any © yea be 
that may have existed from year e jan 
the admission policy of the Schoo 1941 Wt 
from the classes of 1929 through f me? j 
used. The mean standard scores ° 5; 
receiving the M.B.A. degree wea 30 
of men getting the degree was 45. a) of J 
The Strong blanks (1927 editio je: 
chemists were also scored on this oF ain’ pe 
blanks used were a part of those as nt 
development of the chemist sca ere Tm 
s ks W o” 
Strong blank. Fifty of the blan c 
men who had Ph.D. degrees O" é trai cci 
pleted at least seven years of ee f 
Fifty were from chemists with Masis o i 
ence degrees, or with five or six az em 
lege training. Finally, 50 were fro! 


sin 


The Specialization Level Scale 161 


ste three or four years of college training. 
Most of this latter group held the degree of 
Bachelor of Science. All 150 were members 
of the American Chemical Society at the time 
of testing. None was a teacher or professor 
of chemistry. 
ee mean standard scores for the three 
ey at chemists on the specialization level 
Ms were as follows: Ph.D. group, 52.2; 
sais 47.8; and B.S. group, 46.5. The 
were F deviations for the three distributions 
aie a 8.3, and 8.1, respectively. The 
Ph D ratio of the difference between the 
differe group and the M.S. group was 2.65, a 
Chen nce which would occur by chance less 
ratio one time in a hundred. The critical 
of the difference between the Ph.D. 
sini and the B.S. group was 3.20, which 
uld occur by chance less than one time in 
$ thousand. The difference between the M.S. 
group and the B.S. group was not significant 
C. R. of .81), but was in the expected di- 
rection, 
we investigation was made to deter 
of ch er or not the differences between groups 
Scor emists with respect to specialization level 
twe es might be due to differences in age be- 
of a members of the groups. The mean age 
y e Ph.D. group was found to be 36.3 
tid at the time of testing; the M.S. group 
ér raged 34.0 years; and the B.S. group av- 
aged 35,7 years. None of the differences 
etween pairs of these means was found to 
© significant. 
lee should scores on the specialization 
edu scale be related to amount of formal 
of ras for chemists but not for students 
this usiness administration? One reason for 
diffe difference may be that the two courses 
ject r almost as much in purpose as in sub- 
orr matter, Generally speaking, the more 
speal, taining a chemist receives, the more 
or ialized that training becomes. 7 Research 
Stee master’s thesis usually involves a 
Tow mum of six months of work on a nar- 
Year Specific phase of chemistry. At least a 
tion me spent on a single problem in prepara- 
Dose s the doctoral dissertation. The purt- 
other h the training for the M.B.A., 0n the 
Or h and, is to provide a broad education 
Usiness executives, not to provide train- 


mine 


i 
ey vn ey n 
5 CREE 


ing for specialists in any one phase of busi- 
ness. 

Research on these two groups suggests that 
the scale does not measure mere liking or 
tolerance for education, but willingness to re- 
strict one’s activities to a very narrow field. 
This, of course, is the very essence of spe- 
cialization. One might object that train- 
ing toward the doctoral degree in chemistry 
should not be compared with training for the 
master’s degree in business administration. 
However, the M.B.A. is the highest degree 
ordinarily granted in the field of business ad- 
ministration, except to persons who plan to 
teach business subjects in colleges and uni- 
versities. Furthermore, the difference found 
between Ph.D.’s and B.S.’s in chemistry was 
also found between M.S.’s and B.S.’s in that 
field, although the latter difference was less 
significant. Although this aspect of the re- 
search on the scale may not be considered 
conclusive, it appears on the basis of infor- 
mation available that the scale has been ap- 
propriately named. 

The question naturally arises, in connec- 
tion with such groups as chemists, whether 
the scale has any practical value. The data 
presented above indicate that the specializa- 
tion level may be of value in predicting 
whether a given student who plans to enter 
chemistry will enjoy the narrowing and 
“heightening” of work required of the Ph.D. 
However, if this scale is to be used as a basis 
for the counseling of a person planning to 
undertake a program of graduate study in 
chemistry, it must provide more information 
on this subject than does the chemistry scale. 
The correlation between the chemist and spe- 
cialization level scales (using the blanks of 
the 150 chemists discussed above) was found 
to be only .06, so the two scales certainly do 
not measure the same thing. 

To test whether the specialization level 


scale would provide more information than 


would the chemist scale with respect to 


amount of graduate study to plan for, two 
comparisons of the efficiency of these two 
scales were made. Both comparisons involved 
finding the significance of the difference be- 
roportions of overlap for the two 


tween p 
The proportion of 


scales (3, pp- 75-76). 


Edni, 77". Research | 


162 


overlap indicates the proportion of persons 
in one group who would be classified as mem- 
bers of another group on the basis of scores 
on the scale in question. The first compari- 
son made used the blanks of the 50 chemists 
described above who had Ph.D. degrees or 
had completed at least several years of col- 
lege training. The second involved the 50 
chemists who had a B.S. degree, or had com- 
pleted only three or four years of college 
training. Both groups were used in deter- 
mining the cutting points on which the pro- 
portions of overlap between them were based. 
For the Ph.D. group, the proportion of over- 
lap on the chemist scale was .42 and on the 
specialization level scale was .34. For the 
B.S. group, the proportion of overlap for the 
chemist scale was .46 and for the specializa- 
tion level scale was .36. For both of these 
groups, the differences between the two pro- 
portions of overlap were significant at the 
.01 level. 

The data obtained from the blanks of chem- 
ists indicate that the specialization level scale 
can be used in at least one area outside the 
field of medicine. The data obtained from 
blanks of students of business administration 
point out the limitations in the use of the 
scale. It cannot be used to predict success 
in graduate school without consideration of 
what field the counselee plans to enter. 


Theoretical Implications 


The material presented suggests that a basic 
dimension of interests has been identified, 
however crudely. . Within the subject-matter 
areas tested, the specialization level scale ap- 
pears to separate those doing highly special- 
ized work requiring long training from those 
doing other kinds of work. It may be in- 
teresting to compare the specialization level 
scale with the occupational level scale. The 
occupational level scale separates business 
and professional men from those doing other 
work. The specialization level scale makes a 
similar sort of separation within some of the 
business and professional groups. That it 
‘measures something different from what is 
measured by the occupational level scale is 
indicated by the fact that the correlation be- 
tween the two scales, based on the blanks of 


Milton G. Holmen 


400 medical specialists, is only .07. The con 
relation would undoubtedly be higher if it 
were based on blanks of groups with a greater 
range of scores on these two scales, but the 
specialization level scale is of primary interest 
in groups for which the range of occupational 
level scores is restricted. f 

The material also suggests an extension 0. 
the concept of point of reference as used i 
the development and interpretation of voca- 
tional interest scales (4, pp. 553-576). The 
essential aspect of this concept as develope 
by Strong is that the best reference grouP 
from which to construct the scales used fof 
scoring a given individual’s blank would Þe 
one which included all, and only, the occuP® 
tions the individual might consider enteriM8: 
This is a somewhat idealistic definition, a? 
practical considerations prevent use of 4 an 
ferent set of scales for every person W A 
takes the test. However, many of the Pi A 
sons taking the test are nearly enough 4! p 
to consider entering the broad group of the 
cupations and professions represented in up 
P, reference group. This reference 8'0 a 
consists of the men engaged in the occuP”” 
tions college men ordinarily enter (4; pE 
712-713). The scales based on this gro 
can be used only with respect to blanks 
persons who do belong, or may be expe aai 
to belong, to the reference group on W 
these scales were based. can 

The suggestion made here is that scales ces 
be developed which are based on differen at 
between two levels of one group, a” Je iP 
Scores of these scales can provide valuab * pe 
formation to persons not members ° 
groups on which the scales were construc 
Scales constructed to measure the differe oč 
between two subgroups within a single ige 
cupational or professional group may py of 
measures that are relatively indepen! en ol 
the occupations on which they were 
structed. 


Summary 


d 
op? 
The specialization level scale was deve! 


nal SP& 
for the Strong blank to separate medical $ ch 
cialists from physicians-in-general. Re min? 


r 
reported here was undertaken to dete" se 
whether or not this scale might prov? 


ye — = eee 


The Specialization Level Scale 163 


ful information about other occupational 
groups, and thus identify specialization level 
as a dimension of interests comparable to oc- 
Cupational level. 

Mean scores were obtained on the scale for 
ten occupational groups in three non-medical 
Subject-matter areas and four groups within 
the field of medicine. Within each of these 
areas, the occupational groups were ranked 
in the same order by specialization level 
Mean scores as by the mean educational level 
Of their members, Further research indicated 
that chemists with Ph.D. degrees could be 
Separated by this scale from those with less 
Specialized training, but that the scale did not 
differentiate students who had qualified for 


3 the Master of Business Administration degree 


from those who had entered training for this 
degree but had failed or dropped out of school 
efore receiving it. 

While the evidence presented 'is not con 
clusive, it does indicate that a dimension of 
interests has been identified and that further 


research on the nature of this dimension is 
merited. It indicates further that scales 
measuring intra-group differences may be of 
value for predicting with respect to occupa- 
tional groups not used in the construction of 
the scales, provided norms are available for 
these other occupational groups. 


Received June 30, 1953. 


References 


1. Holmen, M. G. Vocational interest patterns of 
professional specialists. Unpublished doctor’s 
dissertation, Stanford Univer., 1952. 

2. Holmen, M. G. The Specialization Level Scale. 
In E. K. Strong, Jr. and Å. C. Tucker, The 
use of vocational interest scales in planning a 
medical career. Psychol. Monogr., 1952, 66, 
No. 9 (Whole No. 341). 

. McNemar, Q. Psychological statistics. New York: 


w 


Wiley, 1949. 

4. Strong, E. K., Jr. Vocational interests of men 
and women. Stanford: Stanford University 
Press, 1943. 


. Strong, E. K., Jr. Norms for Strong’s Vocational 
Interest Tests. J. appl. Psychol, 1951, 35, 
50-56. 


n 


THE JOURNAL or APPLIED PsycHoLocy 
Vol. 38, No. 3, 1954 


Interest Patterns for Certain Degree Groups on the Lee-Thorpe 
Occupational Interest Inventory * 


Andrew H. MacPhail 


Department of Education, Brown University 


For several years men students entering 
Brown University have been asked to fill out 
the Occupational Interest Inventory (Ad- 
vanced Series, Form A; Lee-Thorpe). The 
degree group patterns discussed here are 
based on scores made by 2,380 candidates for 
the A.B. degree, 170 for the Sc.B. in Chem- 
istry, and 578 for the Sc.B. in Engineering. 
For the purposes of this study it seems rea- 
sonable to consider these three degree candi- 
dacy groups as being validation groups. Cer- 
tainly this is true in the sense that students 
must meet specific requirements in order to 
be admitted to a particular degree candidacy, 
plus the effect of self-selection as manifested 
by the interest in seeking one degree rather 
than another. 

Means and standard deviations were com- 


1 Published by the California Test Bureau, Los 
Angeles 28, California, 


puted for each degree group on each part of 
the Inventory and the significance of the dif- 
ferences of mean scores made by the several 
degree groups on each part of the Inventory 
was then determined. Table 1 shows the 
pattern of mean scores with percentile equiv 
lents for each of the degree groups. Some 
idea of the degree of overlap of groups ™4Y 
be inferred from the data in this table. HOW- 
ever, of the 30 critical ratios computed s 
were found to be significant at the one P® 
cent level. de 

Table 2 shows that the mean scores wae 
by the Arts group and Engineer group d et 
by an amount significant at the one per C? 


e 
level on every part of the Inventory- T 
mean scores made by the Arts group o 


significantly at the one per cent level i 
those made by the Chemist group on S° a 
of the ten parts of the Inventory, aP 


Table 1 


Degree Group Patterns on the California Occu 
Mean Scores * 


pational Interest Inventory 


Arts i Engineers 
(N = 2380) qos, a 2 el ee 
Raw Scores Raw Scores Raw Scores 
, — -E e 
Mean S.D. Qile Mean S.D. Qile Mean SD. ^% 
Fields: 5 
Personal-Social 21.2 60 76.0 174 5.0 570 iss g 4 
Nature 186 72 370 193 70 410 g4 66 fo 
Mechanical 170 56 200 20.5 50 375 24.6 52 20 
Business 23.5 83 725 176 7.0 430 190 68 $5 
Arts 216 73 670 161 59 405 167 60 5 
Sciences 21.3 70 265 32.1 54 805 28.3 57 
Types: 51.0 
Verbal 13.3 44 793 85 3.5 57.5 ie 35 yal 
Manipulative 12.7 24 42.0 13.7 26 520 13.1 25 a35 
Computational 10.3 42 58.0 99 36 545 9.7 3.5 76.7 
Level 734 88 73.0 750 83 783 74.5 84 


pi x y i 
Percentile equivalents are the publisher’s. 


164 


| 


Interest Patterns for Certain Degree Groups 


Table 2 


Critical Ratios (diff./PE diff.) for Arts-Chemists; Arts- 
Engineers; Chemists-Engineers on the California 
Occupational Interest Inventory 

Note: Allratios are significant at the 1 per cent level, 
or better, except as indicated. 


165 


Level the difference is significant at the two 
per cent level. On Nature and Computa- 
tional the differences would not be considered 
significantly great, according to current con- 
vention, since the level of confidence does not 
even reach five per cent. In terms of mean 
scores the Chemists are not as clearly differ- 


Pcs Fe Arts- Sige entiated from the Engineers as either of these 
are emists Engineers g! groups is from the Arts group. However, one 
id Ti : a5 per cent confidence levels are reached on the 
ee ocial Oe i e a Personal-Social, Mechanical, Sciences, Verbal, 
Mechanical 130 470 13.8 and Manipulation. On Business the three 
Business +15.4 +20.0 3.4T per cent level is reached but the differences 
rts E 
J Sciences T E uae a pb Relient at the 3 per cent level, and 3.5 at 
x 7 ` s e2 3 
aa t Not significant at the 5 per cent level. 
erbal +25.0 +48.0 + 44 (Note: Differences in favor of the first member of the 
anipulative 6.6 4.5 + 4.0 group, such as Arts over Chemists, have a + sign in 
Omputational + 19} Æ 51 + .99t front of the critical ratio. Differences in favor of the 
evel aca i 96} second member of the group, such as Chemists over 
3.5t 4.1 a's Arts, have no sign in front of the critical ratio.) 
Table 3 
Critical Ratios (diff./PE diff.) for Arts-Chemists; ane ne Chemists: Eagineers on the 
California Occupational Interest nventory 
Arts Arts Chemists Chants Hnginess Hoginests 
CR« over over (over i Arts Chemists 
= : Chemists Engineers Arts Engineers 
47. verbal mechanica! 
37.5 science 
9, 
35 science 
25, per.-soc. 
20, verbal arts 
17.9 business 
15.4 arts 
13.8 business mechanical 
13, Per.-soc, : 
12 mechanical i 
66 science 
65 manipulative 
A per.-soc. 
s computational manipulative 
4 verbal 
A verba Ina 
3 anipulative 
38 MER nature 
5 
3.4 level business 
Lo 
lg Computational nature arts 
= computational 
96 . 
226, level nature 
$ fe Z art at the 1 per cent level, or better; 3.5 is significant at 


fee AT no 
e ae > 5 
? ber eee ratios over 3.5 in this table 

evel, and 3.4 at the 3 per cent. 


are significant 


166 Andrew H. 
on the other four parts would not commonly 
be called significant. 

In Table 3 the 30 critical ratios, 22 of them 
significant at the one per cent level, are ar- 
ranged in descending order of magnitude. 
The specific purpose of this table is to give 
emphasis to the relative differential values of 
the ten parts of the Inventory with respect 
to the three degree groups, and it is a very 
simple matter to discover which part or parts 
of the Inventory have the greatest differential 
value and between which degree groups. 
Thus, for example, the table shows clearly 
that the three parts having the highest dif- 
ferential value are Verbal, Mechanical, and 


MacPhail 


Science and are effective in distinguishing the 
Arts and Engineer groups. 

Needless to say, in practical use the mean 
scores for the ten parts of the Inventory 
(Table 1) would be rounded off to the near- 
est whole unit. The writer has made con- 
siderable use of this Inventory in student con 
sultations and feels confident that the data 
presented here will enhance its value for suc 
use.? 


Received December 16, 1953. 
Early publication. 


in 
? E. M. Hess and E. C. Allison gave valued help of 
the computational work involved in the conduc 
this study. 


—- -3 a 


Tue Jou 
RNAL 
Vol. 38, No. PeT PsycHoLocY 


Reliabili ; 
iability of Short Rating Scales and the Heterogeneity of th 
Rated Stimuli 4 i 


A. W. Bendig 
University oj Pittsburgh 


Two previ : 
on the e articles (1, 2) have reported 
self-ratin ionship between the reliability of 
gories oe a and the number of cate- 
consistency - scale. Two types of internal 
vestigated: į ern of reliability were in- 
ure of a an rater reliability, a meas- 
nate diferen. ity of single raters to discrimi- 
and test RT the rated stimuli, 
vidual differer, ility, a measure of the indi- 
ently assi pence betyeen raters in consist- 
stimuli, d ing high or low ratings to the 
is a eaa this second type of reliability 
tematic = of what Guilford calls the “sys- 
Ng situati or” of raters (4, p. 273), in a rat- 
Ure of « on test reliability becomes a meas- 
tendency “for bias,” ie. the extent of the 
Over-rate or single raters to consistently 
Presented A under-rate the particular stimuli 
Wo studi 9 them. The results of the first 
reliabilit es indicated that individual rater 
With 5 3 is constant for self-rating scales 
l-categor or 9 categories, drops slightly for 
LOTS a He scales, and appeared to fluctuate 
inconsistent’ -category scales. Because of the 

er investi results with shorter scales, fur- 

the a appears necessary. 
heses aand paper (2) one of the hy- 
ings js pester was that the reliability of 
<S rated sti unction of the heterogeneity of 
pieren ir 2 ae Stimuli that are distinctly 
cond chab] e perceptual field of the rater 
; nsistent ; ae rater to make simple and 
È AET er of difference between 
si the R hile. stimuli that are quite simi- 
diga bly a dimension would overlap con- 
sti erent rater lead to disagreements between 
has uli a a as to the relative order of the 
the Summari is dimension. Volkmann (7) 
ig Width a ay the evidence indicating that 
» st) Partially set of rating scale categories 
liter at on the range of the 
lene, têter t ed to the rater to be judged. 
Sth of bee to adjust the psychological 

e scale to fit the range © the 


167 


stimuli. However, Volkmann poin 

pp. 280-281) that this PE, PA 
not completely flexible: the categories te 
be indefinitely compressed without a loss of 
the rater’s ability to scale stimuli. This su 
gests that rater reliability will decrease as ne 
homogeneity of the stimuli increases and p 
vides the experimental hypothesis for ‘this 


study. 
Procedure 


Stimuli. In the previous study (2) 23 
had rated the list of 20 foods fe o E 
(8) as to preference value. From their mean 
ratings these foods were ranked in order from 
the most liked to the least liked food. Three 
sublists each containing 10 foods were then se: 
lected for the present study. List 1, the list 
containing the most heterogeneous food stimuli 
was composed of the top five and bottom five 
foods from this ranking. List 2, of intermediate 
heterogeneity, was composed by selecting in a 
double alternation pattern 10 foods from this 
ranking. Foods ranked 1, 4, 5, 8, 9, 12, etc. 
were used for List 2. List 3, with the most 
homogeneous stimuli, contained the middle 10 
foods: those ranked from 6 to 15. All three 
lists had a mean ran 
ranking, but rank variances of 58.25, 34.25, and 
8.25. The original li 
course, a mean rank of 10.5 and a rank variance 
of 33.25. 

Scales. 
taining 2, 3, 4, oF 
scriptive sta tement: 
(2) were used to verbally 
Scales with 3 or 5 categories 
each of the end categories an! 

ry scale also had 


center category. The 4-catego 
, but the cen- 


an anchor under each end category 
d mid-way between the 


ter statement was 
two center categories. For the 2-category scale 
the center anchor was 
anchoring statements used 
scales were placed under the two categories. The 
lowest category on each scale was given a nu- 
merical weight of 1, the highest category num- 
bered 2, 3, 4, OF. 5, with intermediate categories 
numbered accordingly. 

Subjects. The twelve combinations of three 


stimuli lists and four lengths of scales were 


cale were used: con- 


5 categories. The three de- 


s used in the previous study 
anchor these scales. 


168 


mimeographed with instructions on single sheets 
and randomly distributed to 278 Ss. The Ss 
were students enrolled in daytime sections of in- 
troductory, social, applied, and educational psy- 
chology classes at this university during the 
spring, 1952-53, semester. These Ss recorded 
their ratings on standard five-choice IBM answer 
sheets for convenience in the later statistical 
analysis. The raters were told that the re- 
searcher was investigating the adequacy of dif- 
ferent rating scales in assessing the food prefer- 
ences of college students and were requested to 
sign their names to the ratings. 

Analysis. The Ss by stimuli matrix of ratings 
for each of the twelve sub-groups of raters was 
analyzed by analysis of variance procedures. 
From these analyses intraclass estimates of the 
average reliability of individual raters were ob- 
tained (3), along with the reliability with which 
each stimuli-scale combination (test) encouraged 
rater bias among the raters in each subgroup (6, 
pp. 93-95). Judgment of the significance of each 
reliability coefficient was based upon the magni- 


tude of the F ratio associated with this coeff- 
cient. 


Results 


The results of the twelve analyses of vari- 
ance are given in Table 1 and the obtained 
reliability estimates are summarized in Table 
2. Individual rater reliability increased ap- 
proximately linearly as a function of the 
heterogeneity of the stimuli with the average 
reliabilities of Lists 1, 2, and 3 being .22, £5, 


A.W. Bendig 


and .06. Rater reliability rose as the num- 
ber of scale categories was increased from 2, 
through 3, to 4, and dropped slightly, but 
consistently, at 5 categories. To assess the 
significance of these findings, Kendall’s non- 
parametric rank coefficient W (5) was used: 
Ranking the rater reliabilities across the rows 
in Table 2 gave a W of .558 which has k 
approximate probability of .07 of ibaa 
by chance (5, pp. 146-147). Ranking t 

same reliabilities down the columns resili 
in a W of .333 which is significant at the 7a 
level. Inspection of the rater reliabilities i 
Table 2 suggests little interaction betwee 
heterogeneity of the stimuli and length w 
scale, although no statistical test of such 3 
interaction was possible. The measut af 
rater bias in Table 2 present a somewhat 4 
ferent picture. Rater bias also gee 
from 2 to 4 categories and slightly deci 
with the 5-category scale, but the results A 
somewhat less consistent than with Se es 
liability. Also, List 2 was generally the te! 
set of stimuli in encouraging systematic 1a 

error and List 1 the least subject to pe 
there results are manifestly a function 0 the 
length of the scale. For example, when uf 
4-category scale was used, List 1 was 9 ed. 
to be most biased and List 2 least bias 


Table 1 


Reliability 


Coefficients and Significance Tests for Each Rating Group 


a 


— 


Rater Bias 


: Number of Number of ae = 
List Categories Raters Group Individual F J 2 
1 2 24 83 17 5.83** 03 co 
3 25 83 17 5.97% 4g b 
3 22 90 29 9.99** 63 (2.08 
5 23 89 26 9.10** 4o 166 
* 
2 2 20 62 07 2.59** 70 cae 
3 25 83 16 5.94** „50 ie 
4 21 85 21 6.67** 43 1a 
5 21 79 5 4.75%" 59 24 
3 2 25 -50 04 2.01% w Bu 
3 27 12 01 1.14 52 Oe 
$ 23 .74 T 3.79%* 60 257 
5 22 .70 -09 3.31** 33 ea 


* Significant at the .05 point. 
* Significant at the 01 point. 


z —— aa aa aama 
Å 


\ 


Reliability of Short Rating Scales 169 


Table 2 


Summary of Reliability Coefficients as Functions of 
Stimuli Heterogeneity and of the Number 
of Rating Scale Categories 


Number of Scale Categories 


List 2 3 4 5 Mean 
Individual 1 a7 7 29% 26% 22 
Rater 2 oF* go 21 15% B 


Reliability 3 ‘o4* or 11** 09 06 


Mean .09 .11 .20 .17 14 


Individual 1 o3 gat .63** 40* 38 
Rater 2 .70** 50" 43" .59** 56 
Bias 3 30 .52** .60** .33 44 

Mean 34 49 55 44 44 


* 
+x Group results significant at the .05 point. 
'gnificant at the .01 point. 


Applying the same W method to the bias 
es gave values of .083 (ranking across 
eee and .021 (ranking down columns), 
lee] of which is significant at the .90 
Stina? For rater bias the interaction of 
muli heterogeneity and length of scale ap- 
pars: to be the most important source of 
ariation among the reliability coefficients. 


Discussion 
fen results of this study have partially clari- 
the relation between rater reliability and 


si 
a length for short rating scales used for self- 
than A cales with 2 categories are less reliable 
result ose with 3 categories, which confirms the 
Scale S in a previous study (2). A 4-category 
ings yields somewhat more reliable stimuli rat- 
Scale pon either a 3-category Or a 5-category 
reliable h a 5-category scale being slightly more 
Ment ( than a 3-category scale. This last state- 
Study 3 vs. 5 categories) contradicts the previous 
Drese (2), and probably only an appeal to omni- 
sampling fluctuations can reconcile this 
Speci E? in the results of the two studies, 
ound Y when we note that our first study | ) 
and ho difference in rater reliability between 
;,category scales. The general conclusion 
>Cate ifference in rater reliability with 3- and 
three Bory scales appears warranted when a 
signif studies are considered. The small, but 
Scales @MtY consistent superiority of 4-categoty 
S interesting in light of the hypothesis 
n by Jones? that rating scales with an 
~~_imber of categories may yield stimulus 


E 


e 
fi 


V. i 
Jones, personal communication. 


ratings of higher reliability. The inclusion of a 
center category in a scale with an odd number of 
possible responses may encourage the rater 
error of central tendency” (4, p. 272) and re- 
duce rater reliability. This hypothesis needs 
further investigation. 

The hypothesis derived from Volkmann (7) 
that raters cannot compress their psychological 
reference scale to give reliable ratings of homo- 
geneous stimuli was confirmed. The suggestion 
that to achieve reliable ratings the rated stimuli 
should cover a wide range of the rating con- 
tinuum appears eminently reasonable and, for- 
tunately, supported by the experimental findings. 

The somewhat inconsistent fluctuations in rater 
bias noted in the Results section of this paper 
preclude any sweeping generalizations. Rater 
bias, in this investigation, did not appear to be 
a consistent function of either scale length or of 
stimuli heterogeneity. 

In the Procedure section it was noted that List 
1 contained stimuli drawn from the entire range 
of the stimuli used previously and best dupli- 
cated the stimuli variance of the original list. 
This may be an explanation for the slightly 
larger rater bias found for List 2. Thorndike 
(6, pp. 229-230) has pointed out that, when the 
responses to test items (food stimuli) are highly 
correlated (as they usually are with ratings), 
using items selected from a large range of item 
difficulty level (food preference) will encourage 
subject discrimination. Since Lists 1 and 3 con- 
tained stimuli only from the center or from the 
ends of the preference continuum the obtained 
drop in bias for these stimuli lists could be ex- 
pected. However, since this explanation is post 
hoc and based upon inconsistent evidence it is 


somewhat unconvincing. 


Two cautions must be emphasized. The re- 


this and the previous two studies (1, 2) 
oe be poe be generalized to the rating 
situation where Ss are requested to report on 
their own feelings, preferences. DR ees ae 
Also, the results can be applied only ta tS n 
by relatively naive raters as represente RE 
lege students. We cannot hope that our An ings 
will be confirmed without modification ki ra 
scales are used to rate more objective stim 


or are used by more experienced raters. 


Summary 


Three lists of 10 food stimuli were se- 
lected so that the lists varied in the hetero- 
geneity of the stimuli. Preference ratings 
were collected from 278 Ss using rating scales 
with 2, 3, 4, or 5 categories. Rating reli- 
ability was highest with the most heteroge- 
neous list and with the 4-category scale and 
was lowest with the most homogeneous list 
and the 2-category scale. Rater bias results 


170 AW 
were more tentative, with the list of inter- 
mediate stimuli heterogeneity and the 4-cate- 
gory scale most subject to systematic rater 
error on the part of the Ss. 


Received July 29, 1953. 


References 


1. Bendig, A. W. The reliability of self-ratings as a 
function of the amount of verbal anchoring 
and of the number of categories on the scale. 
J. appl. Psychol., 1953, 37, 38-41. 

2, Bendig, A. W. Reliability and the number of rat- 


ing scale categories. J, appl. Psychol., 1954, 
38, 38-40. 


. Bendig 


3. Ebel, R. L. Estimation of the reliability of rat- 
ings. Psychometrika, 1951, 16, 407-424. 

4. Guilford, J. P. Psychometric methods. New 
York: McGraw-Hill, 1936. D 

- Kendall, M. G. Rank correlation methods. Lo 
don: Griffin, 1948, 

- Thorndike, R.L. Personnel selection. 
Wiley, 1949. = 

- Volkmann, J. Scales of judgment and their E 
plications for social psychology. In AAE 
Rohrer and M. Sherif (Eds,), Social psye! a 
ogy at the crossroads. New York: Harpe 
1951. Chapter 11. neu- 

. Wallen, R. Food aversions of normal and 1945: 
rotic males. J. abnorm. soc. Psychol., 
40, 77-81. 


wn 


a 


New York: 


~ 


o 


Tue Journar. or 
NAL or APPLIED Psyc SY 
Vol. 38, No. 3, 1954 ED PsyCHoLocy 


Scholastic Achievement of Extension and Regular College Students 


Alexis M. Anikeeff 
Oklahoma A & M College 


aa students receive regular college 

is o work completed under an off-cam- 

> tee ension program? In order to permit 

ad e objective answer to the question, a 

ia as initiated to evaluate the scholastic 
ement of extension students. 


Procedure 


oe personnel management examina- 
Male ere administered to approximately 39 
sali oe extension students and to a 
Rider inber wi imal, megulaiy enrolled 
subject 2 before and after exposure to course 
owed matter. The same procedure was fol- 
occas} twice with each group. On the first 
ters ion, the examination covered six chap- 
eee 5, of a standard personnel manage- 
es extbook (3), and lecture material. On 
u os occasion, the examination again 
teria] six chapters, part 7, and lecture ma- 
choice Each examination contained 50 four- 
nation ret items. In addition, each exami- 
items ad been twice refined by rejection of 
terna] Which failed to meet established in- 
a Consistency standards. About 75% of 
ext est items on each examination covered 
Fi information, while 25% of the items 
Sam , knowledge of lecture material. The 
€ individual delivered identical lectures to 
groups of students, and also adminis- 
Moe examinations. 
Ment į ugh the course in personnel manage- 
and t S categorized as a junior level course, 
Doseq F membership is predominantly com- 
one-fifth Junior level students, approximately 
tenth of of the students were seniors and one- 
Ores the students were classified as sopho- 
Ment f The course was a graduation require- 
Other i five of the 39 students. On the 
aS an are 34 students selected the athe 
€ re tek Not more than one-quarter © 
rang ai priy enrolled day students were vet- 
The World War II. 
“Xtire] evening extension class was compose¢ 
Yete A of World War II veterans. Each 
received thirty dollars per month for 


tereq 


attending and being enrolled in the course. 
All members were enrolled under one of the 
following two provisions: 1. College degree 
program, or 2. Two-year college certificate 
program. The distinction between class mem- 
bers classified under the two programs was 
almost completely ambiguous. Selection of 
either program rested solely with the student. 
Extension students shifted from one program 
to the other program indiscriminately, and 
with considerable vacillation. 

Both groups of students were matched ac- 
cording to initial performance on each of the 
tests. Standard errors of the means were 
corrected for matching on an infallible cri- 
terion in the case of a “before” and “after” 
comparison of the same group on the same 
test. All other standard errors of the means 
were corrected for matching on a fallible cri- 
terion; namely, initial performance on the 
test. Formulas for obtaining the corrected 
standard errors of the means were available 
in Guilford’s publication (1, p. 196). The 
corrected standard errors of the means were 
employed in formulas used to establish the 
significance of the difference between arith- 
metic means of correlated data. 

Standard errors of standard deviations were 
corrected for matching on an infallible cri- 
terion or on a fallible criterion as the data 
dictated. Peters and Van Voorhis (2, p. 143) 
e formulas for this computation. 
ed standard errors of standard 
deviations were used in formulas for deriving 
the significance of the difference between 
standard deviations of correlated data. 

For the purpose of securing a value to be 
used in the formulas for determining the sig- 
nificance of the difference between arithmetic 
means and between standard deviations of 
correlated data, a coefficient of correlation 
was obtained for each of twelve comparisons. 
When the number of cases in one distribu- 
tion exceeded the number of cases in the dis- 
tribution with which the first distribution was 
compared, the superfluous unmatched cases 


supplied thi 
The correct 


172 


were dropped, as suggested by Peters and Van 
Voorhis (2, p. 449). Approximately 124% 
of the cases were dropped for one compari- 
son, 10% of the cases were dropped for two 
comparisons, 24% of the cases were dropped 
for four comparisons, and none were dropped 
for the remaining five comparisons. In all 
cases, the Pearsonian product-moment coeffi- 
cient of correlation was employed. 

Six comparisons of data were made for 
each of the two tests which were adminis- 
tered: 1. Extension students before studying 
vs. extension students after studying, 2. Day 
students before studying vs. day students 
after studying, 3. Extension students before 
studying vs. day students before studying, 4. 
Extension students before studying vs. day 
students after studying, 5. Extension students 
after studying vs. day students before study- 
ing, 6. Extension students after studying vs. 
day students after studying. For the pur- 
pose of this investigation, both presumed and 
actual studying are considered to be studying. 


Results 


When considering the data found in Table 
1, it is well to recall that the results are 
based upon tests which have been refined 
twice. In addition, it is noteworthy that 
each test contains 50 four-distracter ques- 
tions. Under these circumstances, an arith- 
metic mean of 12.50 correct answers can be 
obtained by random guessing. A further 
analysis of random guessing indicates that 
an arithmetic mean of 18.50 must be ob- 
tained in order that random guessing can be 
discounted at the 5% level of confidence. 


Alexis M. Anikeeff 


Moreover, a mean of 20.39 must be obtained 
to reach the 1% level of confidence. 

Evidence that factors other than randig 
guessing contribute to a naive student’s tes 
score is documented by published reseat' 
and the author’s analysis of test respons 
As a consequence it is reasonable to aa 
that random guessing represents a very ©° e 
servative criterion of scores which coula 0 
obtained by students who were unexpose' ii 
course subject matter. In view of this sA 
tion, the fact that the mean of extension ` y 
dents after studying does not differ sina 
cantly from the random guessing mean a 
the first experimental test, and fails to E 
the 1% level on the second experimental pi 
is worthy of consideration by extension p 
gram administrators. 

The significance of differences was d be- 
for testing the reliability of the differences ard 
tween arithmetic means, between see 4 
deviations, and between the obtained a í 
cients of correlation and true coefficien ys- 
correlation assumed to be zero, A table nal” 
trating specific details of the foregoing ie. 
sis was omitted to reduce publication diffe 
However, the results indicate that the ten” 
ences between SD’s and AM’s of the = al 
sion student distribution after studyin® ed 
not significantly different from those oP fore 
from the regularly enrolled students mint 
Studying for the first experimental exä 
tion. j 

On the second experimental exam! y 
the AM of extension students after StUCT of 
does not differ significantly from the dyin8 
regularly enrolled students before stu 


erived 


natio” 


Table 1 


Performance of Extension 


and Day Students Under V; 
Test Administration 


arying Conditions of 


pee 

Number Arithmetic Standard Coc. 

Cases Mean Deviation Corr- 

evia = 

Sequence Exam. Day Ext. Day Fzt. Day Ext. pes 
Pretest 1 39 39 17.2 14.4 3.8 37 u 
Posttest 1 39 38 28.4 17.8 57 44 4 
Pretest 2 40 39 18.5 163 38 43 p 
Posttest 2 3 35 275 199 s4 g3 = 


* Day students vs. extension students, 


Scholastic Achievement of College Students 


aa for the first examination. How- 
an Pe diterenice between SD’s is signifi- 
pa the 5% level of confidence for the 
In oi ae on the second examination. 
Salle ition, the estimated 7 is not signifi- 
fhe a eo for the same comparison on 
ad examination, although the estimated 
T Acs sond experimental examination 
fides er from zero at the 5% level of con- 
ce. 
ee oe regularly enrolled stu- 
lien ¢, tained higher scores after studying 
the sats did before | studying, on each of 
te inistered examinations. However, the 
te y enrolled students achieved higher 
studyin an extension students, both before 
and after studying. 


Discussion 


ae ee students are to receive college 
Progra; or courses offered in an off-campus 
achiever’ it is reasonable to expect that the 
comparable, of extension students should be 
enrolled le to the achievement of regularly 
requent] students. Practical considerations 
Ostensibly becloud the issue. Day students 
acquisitic, exert their major effort toward the 
a Ministe of knowledge prescribed by school 
ents Strators. Conversely, extension stu- 
effort ~ evening classes exert their major 
actual Oward earning a livelihood. In the 
goal Situation a considerable overlapping 
in wer may occur. Nevertheless, differences 
ces į orientation could account for differ- 
reflected motivation, and in this manner be 
Mesh coe differences of achievement be- 
Tolled extension students and regularly en- 
Students. 

igue ological and psychological types of 
ences -, Probably exert their insidious influ- 
Theria both the instructor and the students. 
bol sA itor frequently drives many miles, 
a iige unpalatable food in unfamiliar sur- 
Subject 8s, and talks for three hours about 
the da Matter previously discussed during 
Wait a to heavy-lidded students who eagerly 
families dismissal and reunion with their 
twe erences in educational backgrounds be- 
len s ao Sisin and regularly enrolled stu- 
> AS well as other factors, may also con- 


173 


tribute to the difference in achievement scores. 
However, despite the reasons for the differ- 
ences between the achievement of extension 
and regularly enrolled students, if further 
studies support results found in this investi- 
gation, the case for granting college credit for 
extension work will be severely challenged. 
If the administrators persisted in granting 
college credit under these circumstances, the 
administrators would be honor bound to give 
college credit to the regularly enrolled college 
students solely on the basis of payment of 
registration fees. Under these conditions 
knowledge of lecture and textbook material 
would be optional. 


Summary 


Identical pretest and posttest examinations 
were twice administered to a group of exten- 
sion students and to a group of regularly en- 
rolled college students. Six comparisons of 
educational achievement between groups were 
obtained for each of two examinations. 

1. The arithmetic means of regularly en- 
rolled students on pretests did not differ sig- 
nificantly from the posttest arithmetic means 
of extension students on identical examina- 
tions. 

2. The posttest mean of extension students 
on the first examination did not differ signifi- 
cantly from a mean which could be obtained 
by random guessing. On the second examina- 
tion, the posttest mean of extension students 
differed at the 5% level of confidence from 
the mean which could be obtained by random 
guessing. 

3. In view of the obtained results, a ques- 
tion was raised about the advisability of 
granting college credit for work performed in 
evening off-campus extension courses. 


Received June 29, 1953. 


References 


damental statistics in psychol- 


ilford, J. P. Fun 
ag (and Ed.) New York: 


ogy and education. 
McGraw-Hill, 1950. 
2. Peters, C. C. and Van Voorh: 
procedures and their 
New York: McGraw-Hill, 1940. 
3. Scott, W. D., Clothier, R. C., and Spriegel, W. R. 
Personnel management. New York: 


McGraw-Hill, 1949. 


is, W. R. Statistical 
mathematical bases. 


(4th Ed.) 


THE JOURNAL or APPLIED Psycnorocy 
Vol. 38, No. 3, 1954 


Index of Collaboration for Test Administrators 


Alexis M. Anikeeff 
Oklahoma A & M College 


Freedom to secure appropriate information 
from fellow test-takers during the administra- 
tion of an examination may be considered an 
inalienable right by some individuals. Others 
may denounce this procedure as a scourge 
upon the American educational system. Per- 
haps both groups will agree that from an un- 
moral, and a solely objective viewpoint, such 
a practice is undesirable because it lowers the 
reliability and validity of the testing process. 

Proctoring examinations, distributing sev- 
eral forms of an examination, rearranging the 
same set of questions, seating test-takers at 
maximum distances from each other, and 
haranguing test-takers that virtue will tri- 
umph are methods which have been used 
with, varying degrees of success. Concomit- 
ant with the foregoing procedures, would it 
be possible to develop some method or tech- 
nique which could indicate the presence of 
collaboration on multiple-choice- tests even 
though the act of collaboration went un- 
noticed by the test administrator or the 
proctor? The purpose of this study was to 
develop and test the usefulness of such a 
technique. 

A scrutiny of examination papers submitted 
by two individuals who were obviously col- 
laborating with each other during the ad- 
ministration of an examination suggested the 
feasibility of comparing the distracters se- 
lected by each individual for his incorrectly 
answered questions. Under somewhat simi- 
lar circumstances, Bird (1) found that a 
comparison of incorrectly answered multiple- 
choice and completion questions offered defi- 
nite possibilities of detecting collaboration, 
For the purpose of the present study, col- 
laboration is defined as any voluntary or in- 
voluntary dissemination of information on 
the part of one test-taker for the purpose of 
improving the test score of another test-taker 
during the administration of an examination. 

Documented knowledge indicates that ran- 
dom guessing would permit one question out 
of four questions to be answered correctly 
when four optional answers are presented for 


174 


each question and one of the four answers a 
always correct. Random guessing by ee 
tion implies a complete absence of knowle A 
about the subject matter tested. There 
if four wrong answers are substituted for oe 
correct answers, random guessing would n 
ertheless permit one of the wrong answers 
be selected by chance alone. However; tO rily 
extent that more than one of the aie 
selected wrong answers is chosen under t . 
circumstances, something other than ran 
guessing may be operating. ` that 
The index of collaboration assumes ss- 
within specific levels of confidence it !$ Bi a 
sible to detect collaboration between cter 
takers. A comparison is made of disa 
selected for incorrectly answered quest!? sing: 
two or more individuals. Random gues? ie- 
as previously indicated, would ly 4 
fourth of the total number of incorrect y one 
swered four-choice questions selected MY is- 
test-taker to be answered with identic® dual: 
tracters by an adjacently seated in Vita 
A simple illustration of the foregoing dual 
tion could be portrayed by two in P ques 
who managed to answer 20 identica’ ‘ipa 
tions incorrectly on a 50-question as tion 
tion which employed four-choice ps way? 
and one of the optional answers Wa® si fv? 
correct. It is reasonable to believe t4 gm 
of the 20 identical questions coul j 
swered incorrectly by using identical ect an 
ters. The number of identical inco?” poth 
swers above five needed to be shared ica 
test-takers before collaboration is imp 
can be determined by the use of @ 0 H 
formula (2) for the standard erro" ple 
frequency, \/Npq, or read from m 
which is based on this formula. SY i) 
in the formula refers to the number ese o 


i S Pe 
tions answered incorrectly under 
cumstances. 


be 
Procedure jon ije 

i ratio" pd 

An effective measure of the collab by aio 


dex’s validity is an admission of BUF labo” j 
viduals who have been identified as ©° inv? 
by the index. At least ten cases 


<5 
= 


Index of Collaboration for Test Administrators > 175 


Table 1 


Index of Collaboration for Use with 


Four-Choice Questions 


Number of 


Number of Identical Questions 

Wrong on Examination Paper B 

Using Same Distracters as Found 

on Examination Paper A Needed 
to Establish Existence of 


paons Collaboration at Various 
Examination Levels of Confidence 
Pa 
Dera Confidence Levels 
Bey 5% 1% 01% 001% 
bs 27 m g H 
: 3.2 3.8 44 5.0 
6 36°. 42 5.0 5.6 
1 4.0 4.7 5.5 6.2 
$ 44 5.2 6.0 6.8 
48 5.6 6.5 7.3 
10 $9 60) i mie 
1 56 «Gaeta g 
z 5.9 6.9 79 8.8 
HS 6.3 73 84 9.3 
n 6.7 7.7 8.8 9.8 
15 7.0 8.1 93 101 
$ 7.4 8.5 97 10.7 
Hl 7.8 $6 10%. 112 
fe 8.1 92 105 116 
20 8.4 06 ioe 121 
2 ss 100 114 125 
oe 91 104 118 130 
3 95 10.7 122 134 
24 9.8 11.1 12.6 13.8 
25 102 15 130 142 
26 105 118 134 147 
27 108 122 138 151 
28 12 126 142 155 
29 us 129 145 159 
30 18 1233 149 163 
31 22 Be iss G 
32 125 140 15.7 17-1 
33 28 143 161 17.5 
34 131 147 164 ° 179 
35 134 150. 168 © 7183 
36 38 .is4 d72 © 187 
37 {dai Baa co ee 
38 14, 160" 17 0b ee 
30 14.7 16.4 18.3 19.9 
40 150 167 186 203 
19.0 20.6 


15.4 17.1 


two 
Ntysts ay P 
Adie, dividuals, have been validated by, direct 


it E Ssion 
tive 


u; 
1 


indirect admission; e.g., “I won't deny 
ing © serettainly won't admit it,” or by, objec- 
ividua] ation in isolated instances when an 
is mas found flagrantly copying answers 
s mac ghbor and was permitte j 
nner for the duration of the examina- 


d to continue 


tion. Under these circumstances it would 

that the index of collaboration has been ance 
ful in identifying every known case of collabora- 
tion within the past two years. Unfortunately, 
little was known about the success of the cole 
laboration index in the identification of the un- 
known cases of collaboration. 

In order to further test the effectiveness of the 
index of collaboration, a group of 17 regularly 
enrolled college students were asked to collabo- 
rate with each other during a second administra- 
tion of a personnel management classroom ex- 
amination which contained 50 four-choice ques- 
tions. As a result of lengthened summer session 
classroom periods, students were asked to return 
to the classroom 45 minutes after the beginning 
of the first examination for the purpose of hear- 
ing an important announcement. When the stu- 
dents reconvened, they were informed that they 
would receive an A-grade weighted equal to that 
of one regular examination if they would retake 
their previous examination under collaboration 
conditions. The students were told that they 
must collaborate with one or with several stu- 
dents in order to receive their reward. The stu- 
dents were further informed that they were in a 
simulated regular examination situation, and con- 
sequently, any detectable case of collaboration 
would be discouraged by the instructor. 

Students kept a record of the number of an- 
swers which they obtained from each student 
with whom they collaborated. This information 
was retained by each student until the collabora- 
tion analysis was completed in order to insure 
greater objectivity of the analysis. 


Results 


Detailed results are available in Table 2 
where the number of identical wrong dis- 
tracters are indicated as being shared by each 
student paired with every other student. Of 
the 17 students participating, collaboration 
could not be uncovered for seven, collabora- 
tion was found operating on the 5% level for 
two students, and on the 001% level for 


eight students. 

A comparison of 
sis with the data o 
kept by each studen 
in the experimental 


collaboration index analy- 
f extent of collaboration 
t, Table 3, reveals that, 
situation, the index of 


collaboration failed most significantly in the 
identification of student L who secured as 
many as twenty answers by copying from 
three other students. On the other hand, the 
index was able to identify two students, G 
and O, at about the .001% level of confidence 
when both cooperated closely with each other 
and secured only five answers from each other. 
Other cases appear to substantiate the belief 


176 Alexis M. 
that the index is most discriminating in the 
identification of one-way collaboration when 
an individual copies a sizable number of an- 
swers from a single test paper. A substan- 
tially smaller number of answers apparently 
need to be shared when active two-way col- 
laboration is involved. Moreover, the col- 
laboration index appears unable to identify 
with any particularly useful degree of ac- 
curacy, the individual who copies answers 
from several individuals when the individuals 
in question fail to reciprocate his behavior. 
In addition, the 5% level of confidence was 
found much too crude for accurate identifi- 
cation of collaboration. 


Discussion 


The collaboration index is premised upon the 
operation of random guessing. Consequently, 
the effectiveness of the index will vary directly 
with the degree that random guessing is operat- 
ing. Since the classroom examination adminis- 
tered to the experimental group was refined 
twice, and the distracters lacking pulling-power 


Anikeefi 


nevertheless be safe to assume that all distracters 
do not have equal attraction values. An analy’ 
sis of distracter effectiveness made by tallying 
the number of times a distracter is selected 
could suggest whether a modification or an & 7 
justment is needed in order to secure a moi 
accurate indication of collaboration. For E 
ample, if on a four-choice question two dist 
ters are never chosen, and one of the remit 
two answers is the correct one, then by “king 
tion an individual has only one chance of ma 
an error. To the extent that a cons 
variation is found in the number of eine ex 
distracters among the questions on the hor f 
amination, the application of the princip than 
binomial expansion may prove more useful 
the standardized index of collaboration. din 
Although the index of collaboration use ith 
this study, Table 1, is developed for am be 
four-choice questions, a similar index Cou num: 
developed for examinations using any other that 
ber of distracters. Moreover, in the event ion 
a test administrator of a four-choice examin irs, 
is unaware of the effectiveness of his aon A 
and if an analysis of distracters is for som ore 
son impractical at the moment, he may fee choice 
secure in using an index based upon three 


: a vould 35 
questions. Under these conditions he your an- 


were eliminated, it is reasonable to believe that sume that only three of the four option’. ate 
random guessing was present to a greater extent swers are effectively operating in terms 
in the experimental situation than it would have traction for any four-choice question. jon $ 
been if a non-refined examination were used. The usefulness of the index of collaborat gra- 
Despite the refinement procedure, it would not limited solely to identification of colla 
Table 2 
Paired Comparison of the Number of Identical Wrong Distracters Shared by Experimental 
Group Members During Collaboration Examination 
Student Code A BCDEFGuio.typxtbtmnoP 4 

No. Wrong 2 25 

on Exam 24 22 19 17 19 19 05 22 23 05 23 26 19 15 05 2 8 
A 24 ws 2? ¢@ 2a YY FF 7 s S © 2 zü 
B 22 5 5 6 Mt 3 it 5 3 129 7 6 4 3 4 
č 19 593 41 421 5 7 4 sil A 2 
D 17 44343 5 6 6 3 % |e 
E 19 52 432 52642] 6 
y 1 2 65 2 m6 7 4 2 5 3 
G 05 12 5 23 3 1 st 4 5 
H 22 G1 ips 6 2 t Ga 
I 23 2763 02; 4 
J 05 23 3 1 st 5 7 
K 23 6¢ & 2 7s 
L 26 6 ¢ 3 72 
M 19 3 342 
N 15 in 
o 05 8 
P 22 
Q 25 


a Probability of collaboration 19:1. 
f Probability of collaboration 9,999: 1. 


Index of Collaboration for Test Administrators 


177 


Table 3 


Number of Answers Indi 
icated by Students as Being Copi 
| pied from Oth 
During Collaboration Examination a ees 


OVOZZOAUVHHaATHOOwS 


Student Code 
A B C 
No; Wrong SEP eR EST SS Se 
on Esai 24 22 19 17 19 19 05 
5 22 23 05 e 
F - - z 23: 26 49 15 05 22) 23 
5 3 
- § 15} 3t t 
17 P 2 2 * 
19 š § 
be t t 3t 
it 
22 st 
E t 8t t 
05 S 
8t 9: 
= t 104 t i 
19 s 2 4 
îs 5 4 2 
os § 5 5 
5 St t 
2 1 ; ue ti 
25 6 
Note: : 
> pi mdan in vertical column copies from student in horizontal column. 
t Correctly identified as collaborating at 5% level, one-way collaboration. 
t Involved. identified as being involved in collaboration at 001% level. 
Presumed y collaboration through third individual. 
tion victims of 5% level erroneously identified collaborators. 
» Allayi sa 
Bey) under son, the suspicions of collaboration 1. The index of collaboration was found 
Tans than the circumstances prove more heart- reasonably effective in the identification of 
l hoata ae cstehlistiment ot Pio a of col- collaboration despite the inability of th 
ail . For example, did the student who anpra Di i y ot, tae ad- 
ministrator to detect its existence during the 


ed 
grade wait regular quizzes earn an excellent 
ceased aam final examination as a result of in- 
collaboratio ivation and preparation or throug 
todtention t Available evidence supports the 
ion hag „that the use of the index of collabora- 
Ore wary otivated many test-takers to become 
ministration their fellow classmates during the 
View oe of classroom examinations. 
to “lieve ik the available data it is reasonable 
why entity at the index of collaboration is able 
th ich could cases of large scale collaboration 
© Dolish, not otherwise be identified, owing to 
her ed skill of the collaborators. On the 


ti $, . 
ively “22d, the index of collaboration 1S rela- 


y 

W i J R 

nio seare tive in identifying the individual 

Means of P information sporadically ani by 
pers 


Whi 


furtive 
i 8 glances at numerous pal 
Nod. Urround him during 


his examination pe- 


_ An j Summary 

et on for the identification of collabora- 

its n of oR test-takers during the adminis- 

gro, ectiy n examination was developed, and 
“Up. eness tested on an experimenta 


administration of the examination. 

2. The index of collaboration was most ef- 
fective in the identification of large scale one- 
way collaboration involving the copying of at 
least 16% of the answers from a single ad- 
jacent test-taker. 

3. Two-way ac 
fied when only 10% 


shared by two individuals. 
4. Identification of collaboration was least 


effective when an individual copied answers 
from several other test-takers in a sporadic 


and unsystematic manner. 
Received July 30, 1953. 


References 
detection of cheating in objective 
Sch. & Soc., 1927, 25, 261-262. 
2. Guilford, J. p. Fundamental statistics in psy- 
chology and education. (2nd Ed.) New 

York: McGraw-Hill, 1950. 


tive collaboration was identi- 
of the answers were 


1. Bird, C. The 
examinations. 


THE JOURNAL or APPLIED Psycuorocy 
Vol. 38, No. 3, 1954 


Group Manual Dexterity in Women * 


Andrew L. Comrey and Gerald Deskin 
The University of California at Los Angeles 


Two previous articles (1, 2) described the 
results of experiments with a group manual 
dexterity task in men. The present experi- 
ment is a duplicate of the second of the previ- 
ous experiments using women college stu- 
dents as subjects instead of men. Although 
a complete description of the experimental 
procedures can be found in the previous re- 
ports, a brief summary will be given here. 


Procedure 

Sixty pairs of volunteer women university 
students were given six trials on a modifica- 
tion of the Purdue Pegboard Assembly Task. 
Instead of making each successive assembly 
by starting with the preferred hand, the sub- 
ject was required to alternate the hand used 
to place the first element of each successive 
assembly. Thus, instead of using the stand- 
ard instructions for the Purdue Pegboard 
Assembly Task, the subjects were instructed 
to begin each assembly after the first one 
with the same hand used to place the final 
washer on the preceding assembly. On this 
individual task, each person of the pair 
worked on her own pegboard. The two peg- 
boards were placed end to end, although the 
girls could not watch each other since a 
screen was placed between them. 

After the six individual trials, the screen 
and one of the boards were removed from 
the table. The other pegboard was placed 
lengthwise between the two girls. Six more 
trials were taken in which the girls worked 
together on the assemblies instead of work- 
ing individually. The first subject, for ex- 
ample, would place a peg in the first hole on 
her side of the board, after which the second 
subject would add a washer and the first sub- 
ject would follow with a collar, and finally 
the second subject would complete the as- 
sembly with another washer. Instead of re- 
peating this operation, however, the subject 
who finished the first assembly would begin 
the next assembly by placing a peg in the 


* This research was supported by a grant from the 
University of California. 


second hole on her side of the board. a 
first subject would then place on the plies 
washer, and so on. Thus, the assem ardi 
formed a zigzag pattern down the he 
first one subject beginning the assembly at 
then the other, functions alternating “all 
time. In terms of the kind of set requita 
this process appears to be similar to tE jects 
vidual assembly task in which the s rting 
were required to alternate hands in st@ 
successive assemblies. for the 
On the basis of the sum of scores k, the 
last four trials on the individual eam k 
members of pairs were divided into bh r 
gories, “high” and “low.” Reliability’ 
“high,” “low,” and “group” scores We three 


F . jals A 
termined by correlating scores on ae six 
and five with scores on trials a it for 


and correcting by the Spearman-Br zgi Oe 
mula. “Difference” scores were Pa scot? 
tained by subtracting the “low” tot@ score 
of each pair from the “high” tore ewer 
All intercorrelations were compute cigrowP 
“high,” “low,” “difference,” an oh” and 
scores. The correlations of the — wer’ 
“low” scores with the “group” score? ples 
corrected for attenuation in both be jcti"? 
and beta weights computed or Po «dik 
“group” scores from “high,” “low,” & velati” 
ference” scores.t The multiple co” 


coefficient was also computed. once 
«qiftet? ge 

1 In the two previous experiments, the It was ce 
scores Were not included in the analysis. > jue, 


cided to add them as a factor of possible? oblige 
and to give a check on the whole process soli ow 
ing beta weights. Since the “differen? y com 
completely determined by the “high’ ed for ptio” 
Scores, the whole system is overdetermin oxi re 
puting beta weights so that only an ape cat ton 
to a solution is usually obtained. The é par iol 
producing original correlations from th co rel de 
gression equations will be small if me e ree ow, 
are internally consistent, These errors ® “high pert 
under the “e” column in Table 1. The ation jatit, 
correlation was not corrected for attent Less vat 
as in the previous analysis, because this erro 
should not increase as the proportion ° ype! 
ance decreases. Data from the previon the 
have been reworked so that Table 1 eine m 
figures for all three studies based on t sp! 
analysis decided upon for the present &! 


178 


— ee 


— 


Group Manual Dexterity in Women 


ale two previous experiments, as well as 
SE the main objective has been to de- 
cen the extent to which performance of 
= 5 oe a group task could be predicted 
i ae of how well they could do 
Cierto y on a very similar kind of task. 
seein —_ for attenuation were used to gain 
limi knowledge of the theoretical upper 
it of this predictability. 


Results 


jane of this experiment and the two 
tized 2 he ae with men are summa- 
are give able 1. The results for the men 
spectively in rows marked “I” and “II,” re- 
in the ly, whereas the figures for the women 
rows Bri experiment are given 1N the 
Table aezen “II” In the first column of 
“high ” ave listed the total score categories: 
Means ; low,” “difference,” and “group”; 
of s and standard deviations for these sets 

Cores are given in the second and third 


179 


columns, respectively. The reliabilities are 
given in column four, with the intercorrela- 
tions of score variables appearing in the next 
four columns. Beta weights are given in the 
next column, and in the last column are given 
the differences between experimental correla 
tions with “group” scores and those repro- 
duced by inserting beta weights and correla- 
tions in the partial regression equations (see 
footnote 1). 

The pattern of results for university women 
is much the same as that for university men 
except that the over-all multiple correlation 
is only .62 as compared with .66 and .70 in 


the first two experiments, respectively. The 


differences are not significant, however. 

The most important fact which emerges 
from these three experiments is that a sur- 
prisingly small proportion of the total vari- 
ance on a group-performance task can be 
predicted on the basis of how well the team 
members can perform individually on an ap- 


Table 1 


Summary 


Note: E š n 
eo Experiments I, I, and III were with 6 
airs à 

S of male undergraduate psychology students, 


respecti 
ctiy N 
Se ely. The numbers under the “e 


4p 


of Three Experiments 


5 pairs of mal 
and 60 p: 


te” column are the discrepancies 


from placing th 


e graduate and undergraduate psychology students, 
airs of female undergraduate psychology students, 
between obtained correlations with group 
hts in the partial regression equations. 


e best beta weig! i 
ctively, with the squares equal to 


‘Ores 
n the and those correlations which result i re 
å e three experiments, the multiple correlations were .66, -10, and .62, resp aed A 
and be and .38. The rn values for the difference scores were computed as the geom 
D reliability estimates. 
S ana en Aa Correlation with Beta 
T M High Low Difi. Group Wt. 
_ R Mean gh z= 
a a “a € B H 35 01 
: p z Ci ac 
m ee 1.00 AO Al ar FF a 
a 160 io 2 # 2 0: 
mee y ys ton i AL ot 
x = AO 100: = 49 $ qr a a 
itr a 54 1.00 —-42 AS y 
“i 7 i 4s 56 Lo = 06 00 Es 
aI pe a, ao! 100) a 00 a 
ar Fi g say iw M 00 02 
18.4 5 
Q 
sa 2 87 soe  -53* — 06 1.00 
78 9 B # tf 
a ito a 7 ap ee -0 10 
m 190 18.8 ‘86 ‘pe age 0O L 
es before they were used to 


sm e 6 


* 
Ohta: Thes: poe: 
tain t © correlation coefficients Were correctet 


€ beta weights. 


| for attenuation ir 


n both variabli 


180 


parently similar kind of task. In none of the 
three experiments did that proportion reach 
one half. These figures are based on cor- 
relations corrected for attenuation, and, as 
such, represent what could be done with error- 
less measures. Practically, the possibilities 
for prediction are even less impressive. 
These results suggest that there are other 
important behavior variables to be measured 
which will help to determine how well a per- 
son will perform in cooperative kinds of tasks. 
Evidently a careful analysis of the physical 
operations he performs in the group task, fol- 
lowed by measurement of these job elements 


Andrew L. Comrey and Gerald Deskin 


by means of individual tests, leaves out im- 
portant sources of variance. Just what the 
nature of these other sources of variance may 
be is not immediately evident. Further Te- 
search will be needed to explore this problem. 


Received August 3, 1953. 


References 


1. Comrey, Andrew L. Group performance M T 
manual dexterity task. J. appl. Psychols 
1953, 37, 207-210. her 

2. Comrey, Andrew L. and Gerald Deskin. k 
results on group manual dexterity in ™™ 
J. appl. Psychol., 1954, 38, 116-118. 


a 


Tue Jou: 
Vol. 38, Noa I OD PsycHoLocy 


A Short Method of Factor Analysis * 


Robert C. Wilson 
Reed College, Portland, Oregon 


and 


Andrew L. Comrey 
The University of California at Los Angeles 


Verity will point out the need for a 
kinds of od of factor analysis in certain 
method ena describe briefly such a 
parison’ bp nally present an empirical com- 
plete T short method with the com- 
ample, oid method on a particular ex- 
et frequently arise in psychological 
eveloped ı ere several variables have been 
CATET period of time to assess cer 
situation acteristics of individuals, groups, and 
individu i The variables may be scores on 
items items, groups of homogeneous 
revision other types of data. At some point, 
number i be needed to yield a smaller 
omain i oe which will cover the 
Rstom h equal effectiveness and greater 
rough th This economy may be achieved 
are mea he elimination of variables which 
sessed eles only those functions already 
Of relati elsewhere, and through the retention 
her vely uncorrelated measures. ; 
ables fe i any considerable number of vari- 
a useful ee factor analysis constitutes 
ion upo echnique for providing the informa- 
ertaken. which such a program may be un- 
Used fact Unfortunately, the most generally 
Consumi tor-analysis procedures are SO time 
situatio ng that they are often by-passed in 
factor ns where they could be helpful. To 
ample analyze the items of a test, for ex- 
Comes ne the complete centroid method 
quite an expensive and time-con- 


1 Thi 
D als Bi . 
Now research was carried out under Contract 


i 
ral ome between the University of Southern 
search y and the Office of Naval Research. The 
cus, dia, Was supervised by R. C. Wilson. The modi- 
ioe aay method of factor analysis herein dis- 
ae ere conceived by A. L. Comrey. The opin- 
paged a are our own and are not necessarily 
fiftne, if the Office of Naval Research. ; 
2 J. P. Guilford, and H. J. Locke are the co- 


‘Ons: F 
ible investigators. 


suming assignment if the test contains as 
many as thirty or forty items.* Or, if we 
are dealing with about the same number of 
questionnaire variables, each variable being 
assessed by means of a pool of homogeneous 
items, a similar situation prevails. Even 
though a factor analysis might be valuable, 
then, it may not seem feasible to take on the 
amount of labor required in the methods 
traditionally employed. 

Thurstone (1) has described a diagonal 
method of analysis which is considerably less 
laborious than a complete centroid solution, 
especially when the problem of rotation is 
taken into account. Thurstone feels that the 
diagonal method, while theoretically correct, 
is inadequate as a method of analysis because, 
generally, the communalities which are in- 
serted in the diagonals cannot be very ac- 
curately estimated. Since the mechanics of 
the method place a rather heavy dependence 
upon the diagonal cell values, inaccuracies 1m 
these values may produce considerable dis- 
tortion in the final result. 

Although it is difficult to obtain accurate 
estimates of the communalities, it is fre- 
quently possible to compute good reliability 
estimates for the tests. Since test reliabilities 
can be estimated more acc 
communalities, the accurac, 
method of analysis can be improved by 
stituting reliabilities for communalities 1m the 
diagonals. This changes the nature of the 
problem to some extent, however. Instead of 
analyzing only the common factor variance, 
or some estimate of the extent of that vari- 

i i been suggeste 3 r 
ai ese 4 ou The a ee 
in this paper could be employed effectively in the 

s of such an analysis for the pur- 


preliminary stage: 1 n 
pose of shortening the iterative process. 


181 


182 


ance, the total true variance is used, includ- 
ing what would ordinarily be assigned to spe- 
cific factors if communality estimates were 
used. 

Many factor analysts would object to ana- 
lyzing the total test space in that they are 
looking for factors representing underlying 
variables which “explain” the common ele- 
ments among the variables concerned. If 
this is the objective, of course, it would be 
unsatisfactory to use the diagonal method 
modified by using reliabilities in the diagon- 
als instead of guessed communalities, On the 
other hand, careful analysis of the purposes 
in factoring may reveal that the objective is 
not at all that of discovering some latent ex- 
planatory structure. There may be, for ex- 
ample, several well-developed scales which 
need to be refined and extended, but which 
cannot be discarded just because a factor 
analysis solution suggests that the “best” 
variables are somewhere in between the ones 
actually in use.* 

The problem under such circumstances is 
more nearly one of imposing a relatively arbi- 
trary structure upon the domain rather than 
attempting to develop a new set of variables 
of special intrinsic explanatory value. When 
this is the case, there is no good reason why 
the first factor should not be aligned with the 
best developed and most useful variable. The 
next factor can go through or near a variable 
of established value which is approximately 
orthogonal to the first one. 

In this way, the factors can be made to 
coincide in so far as possible with variables 
which are already serving adequately at the 
time. Other variables fall where they may, 
and are eliminated as they fail to add any- 
thing to the structure needed to lay out the 
domain under investigation. As a result of 
such an analysis, some variables may need to 
be refined in certain directions to make them 
more independent of one another, Other 
variables can safely be dropped altogether in 


? Thurstone (2) reports a study of Guilford’s 
temperament schedules, in which he wished to know 
how many factors were represented in the 13 scores, 
rather than jn ascertaining their common factors. 
For this purpose he used reliability estimates in the 


diagonal cells and factored the matrix by the cen- 
troid method. 


Robert C. Wilson and Andrew L. Comrey 


that they are not adding anything new and 
the factorial picture will very likely suggest 
new areas in which further variables may 
profitably be developed. 

The considerations just presented led the 
authors to attempt an empirical check of the 
diagonal method using reliabilities in order K, 
gain some information concerning how id 
might compare with the complete gees 
method. The results of that empirical chee 
will be presented after a brief description ° 
the mechanics of the method. 


The Method 


er | 
Thurstone (1) has described the aim 
method fully, but a brief repetition 0 
essential steps may be helpful: 


1. Compute the correlation matrix as usu 
inserting test reliabilities in the diagonal on 
This is at variance with the diagonal met 
described by Thurstone (1) in that a E 
communalities would conventionally be pilit 

2. Take the square root of the relia tion 
which is largest and divide every correla er. 
in the corresponding column by this nu gs 
The resulting quotients are factor 1020 
for Factor I. s yer 

3. The variance due to Factor I i ob- 
moved from the correlation matrix bY ne 
taining the matrix of inner products “ý nem 
first factor loadings and subtracting Jud- 
from the original correlation matrix, i d 
ing the diagonal values. Thus, if tests “ 7 
3 had loadings of .6 and .7 in Factor D rigi 
-7 = 42 would be subtracted from the 


phe 
nal correlation between tests 2 and od i 
result of this operation would be entè 
the matrix of first factor residuals. ime 


4. The entire process is repeated, eae? inin 
taking the column with the highest remanc? 
diagonal value, until the unextracted T will 
is presumed to be largely error. yen of 
usually be evident when the square O atio” 
diagonal entries begin to get small in 1°” pyi 
to residual column entries, resulting 1" xac! 
ously inflated factor loadings. The ce 
point at which factor extraction should 
however, must remain to a considera? 
tent a matter of judgment. 

5. The matrix of factor loadings ° 


ase 
ex” 


ptain® 


Short Method of Factor Analysis 


ea e is ready for such rotations as 

ra Be —— to align the factors satis- 

tached’ te ince the factors have been ex- 

Similar a manner designed to make them 

tha sunt, nature to certain existing variables, 

solatio er of rotations required to finish the 
n generally will be rather small. 


Results of Two Analyses 


Pps article (4) reported the results 
qiestionna ete centroid analysis of certain 
in the S variables used by the authors 
attitudes i y of supervision and employee 
diagonal m a naval shipyard. A modified 
to the es od of analysis was also applied 
abilities in pi correlation matrix, using reli- 
etermine ; A of guessed communalities, to 
which mieit e extent of the discrepancies 
two differ t occur between the results of the 
icle tre rent procedures. The previous ar- 
e eae the nature and significance of 
e wie factorial results, so that we will 
Wo Prien here only with comparing the 
ndings Me rather than in presenting the 
n the the factorial process. | 

“xtracted centroid analysis, nine factors were 
faving t Seven factors were interpreted, 
interpret a. residual factors. Of the seven 
5 PERHE factors, three were factors specific 
h the et variables included in the analysis. 
tors foe diagonal analysis, seven fac- 
blets, T extracted, two of which were dou- 
Agreemer lie two solutions achieved excellent 
a duki on six factors, leaving each with 
Other et factor not clearly present 1m the 
analysis. Table 1 presents the com- 


Parati 
inp Ve findings for the two studies. Load- 


Ings 

a i š 

ihe cat listed only where the value in one or 
For in- 


ot : 
tmatios analysis was .40 or more. 
‘on on the nature of the variables 1n- 


Voly, 
$ + 
Fort (4) reader is referred to the previous 


Discussion 


The ; 
Car total amount of labor expended in, 


ats 
Wag m8 out the modified diagonal analysis 
Quir *pproximately one-eighth of that re- 
chug to complete the full centroid anal 
's jj ng the rotations. Much of the saving 

€ rotation process itself, since only a 


ysis, 


183 


minimum of readjustment of the axes is nec- 
essary in the modified diagonal case. In the 
present el T graphical orthogonal ro- 
agonal analysis, and 79 such S e se 

; : otations were 
carried out with the centroid factors. 

The agreement between the two analyses, 
as evidenced by the data in Table 1, would 
suggest that the loss in accuracy is not great 
in comparison with the time saved, provided 
the over-all objectives of the analysis are con- 
sistent with a sacrifice of this kind. 

The 25 variables for this analysis were de- 
rived from 13 dimensions or item pools by di- 
viding each dimension into comparable halves 
or sub-dimensions on the basis of item con- 
tent. One sub-dimension was discarded be- 
cause of lack of item homogeneity. The cor- 
relations between comparable halves were 
used as the reliability estimates to be in- 
serted in the diagonal cells. For 15 of the 
variables this correlation between halves was 
their highest correlation. Since the highest 
correlation in the column was used as the 
communality estimate in the centroid analy- 
sis, 15 of the diagonal cell values were the 
same for both analyses. This favored a simi- 
lar outcome in both analyses. 

Inspection of Table 1 reveals that many of 
the differences in loadings between the two 
analyses may be attributed to discrepancies 
in the amount of variance extracted. These 
differences in extracted variance are revealed 
by comparison of the sums of squares of fac- 
tor loadings for each variable in the two 
analyses. Variables 1 and 2, for example, 
had low diagonal entries for the modified di- 
agonal analysis because the reliability esti- 
mate obtained by correlating variables 1 and 
2, which were supposedly comparable half- 
dimensions, was low. Evidently the low re- 
liability estimate was due, in some degree, to 
of comparability between halves; the 


lack 

communalities for the centroid analysis were 

higher. Had higher values been inserted in- 
reliability estimates actually 


stead of the 
used, the loadings fi 
have been higher. 
ment of factor loa 
ods will be greate 


or variables 1 and 2 would 
Tt is expected that agree- 
dings using these two meth- 
r for variables where the 


184 


Robert C. Wilson and Andrew L. Comrey l 


Table 1 


Comparative Factor Loadings * 


I. 


Ta 


2 he 
Me Wa O LUU O IVeIVa Ve Va Vie Vy VIE VIa = 31 
40 13 51 r 
1 a 2 45 77 
2 76 
3 78 85 4 76 
4 83 84 
60 66 
5 49 50 40 53 ot 73 
6 38 44 25 44 51 55 62 6 
7 1 i 31 40 o 
8 5 4 
1,09 
8 
9 51 87 6 8 
10 55 58 s7 46 
11 44 46 59 42 
12 4 43 ga Bt 
13 70 74 45 35 gi S 
14 65 70 45 45 118 9 
15 877 17 21 2 79 
16 65 75 ag 60 
17 6 63 A 
18 65 61 24 46 a 79 
19 68 6l 5 u 73 6 
20 55 63 Bip 
2173-68 K 
82 
22 2r 50 70 57 s 
23 56 60 a 5 
yE A 
24 68 69 57 70 
25 65 75 ; 7 
* Questionn 


amount of commo 
that of the true vy 


aire variables are numbered down the left side of the t 
ons of the centroid (c) and diagonal 


n factor variance approaches 
ariance, 


across 
able. Factors are numberet = rori 
(d) analyses placed side by side for cach r more: 
cases where the loading obtained by one of the two methods is .40 0 
ou 
r up 
E sely 
show that the two agreed fairly clo 
most of the factors obtained. 


Summary 


Occasions arise where it is desirable to ap- 
ply factor analytic techniques, but the ex- 
ploratory nature of the work and the time 
available may not justify a complete centroid 
analysis. A diagonal method, modified by 
using reliabilities instead of guessed com- 
munalities in the diagonal cells, is suggested 
as a satisfactory and economical substitute 
for the complete centroid analysis under cer- 
tain conditions. The results of an empirical 
comparison of this method with the complete 
centroid method on one correlation matrix 


Received July 21, 1953. 


References chi 


ysis- an, 
1. Thurstone, L. L. Multiple factor en i 
cago: The University of Chicago mper?” 
2. Thurstone, L. L. The dimensions of te 
Psychometrika, 1951, 16, 11-20. 


\ 
| 
R 
ff, 
perlo cay 
- Wherry, R. J., Campbell, J. T. and G 


we 


jhettd~ psy" 
An empirical verification of the bere è P 
lord iterative factor analysis proc 
chometrika, 1951, 16, wie a Ps o 
.« Wilson, R. C., High, W. S., Beem, | Sf 
Comrey, A. L. 'A factor-analy tie J: ap 
supervisory and group behavior 
Psychol., 1954, 38, 89-92. 


> 


“i 


Tue Journ 
asi ov AppitE 
Val se No e LO 


A Methodological Study of Cigarette Brand Discrimination 


Richard A. Littman 


University of Oregon 


and 
Horace M. Manning * 


University of Minnesota 


Tii 
Sito belief that many consum- 
that they fed identical, at least in the sense 
are Pein indistinguishable once the wraps 
his sive, Recent studies by Pronko and 
Strated i (3, 4) seem to have demon- 
the case oma respect to cola beverages; 
Settled the a seems to have been 
tions (1 sT er way in one set of investiga- 
(5), Ho a and positively in a recent study 
conclusive oe many of the studies are not 
analysis. ecause of errors of procedure or 

The r 

xe cine for objection differ for the dif- 
question iy They reduce, however, to a 
SS. Forex the type of judgment asked of 
f ike ee let us consider the studies 
cola brand (3, 4). In these studies, various 
asked to ge administered to Ss who were 
Studies y identify them by name. In two 
crimination © popular varieties of cola, dis- 

ied on n was random, i.e., names were ap- 
third st a chance basis. Consequently, in a 
Not Pra three obscure brands were used. 
Steaq the were they properly identified; in- 
aDplieq e names of the major brands were 
thors g also in a random fashion. The au- 
Cop, COPClude that “the seven brands of 


evi 5 $ 
Studies erages employed in our series o 
timulus 


function pear to have the same s 
be ‘equi for our subjects and may be said to 
Wivalent stimuli’ ” (3, p. 608). 
d, for 


Is A 
S conclusion seems unwarrante 


Eye; 
n 
able difference 


lt t 3 
imo here were a discrimin 


Dress ee different colas, Ss could hardly ex- 
they Foxe ir awareness of this difference i 
‘Pblieg i unfamiliar with the names to be 
X indeed them, It may be that the colas 
ae er have equivalent stimulus values, 
‘ ise of an identification, as in a 

i 


» doe : 
S not settle the case for this conc 


l 
M; The d: 
aste Gata were collected by Mr. Manning for @ 


esis at the University of Oregon- 


sion. If Ss apply names with better than 
chance accuracy, one may conclude that dis- 
criminable differences exist; but random re- 
sults do not justify the conclusion that such 
differences do not exist. For example, what 
would the results have been if Ss were asked 
to respond by “same” or “different”? There 
Is no a priori reason to expect comparative 
judgments to yield the same results as identi- 
fication judgments. 

This importance of the type of judgment 
used suggests that discriminatory ability in 
this sort of study is a function of test pro- 
cedures as well as test materials. The field 
of psychophysics provides ample evidence 
that this is the case. The questions raised 
here merely spell out the well-founded gen- 
eralizations concerning the relationships be- 
tween discrimination and procedures. 

In the present study, two specific questions 
were raised: (1) Is there evidence that indi- 
viduals can discriminate among different ciga- 
rette brands? (2) Are there differences in 
the patterns of judgment for two different 
kinds of discriminations, viz. recognition and 
affective? 

The idea of comparing affective and recog- 
nition judgments is based on the following 
consideration. While the ability to apply a 
name correctly to something requires some 
specific training, the ability to say whether 
one likes or dislikes something ordinarily does 
not. To be sure, likes and dislikes may be 
radically changed as a result of experience, 
as may the willingness to say one likes some- 
thing. Even the very nature of the qualita- 
tive experience may be altered as a result of 
such experiences. Tt seems, nevertheless, that 
at any given time almost any object can be 
responded to affectively, even in the absence 


of past contact with that object. 
Tt may be this very universality that per- 


185 


186 


mits affective experiences to play such a 
great role in behavior; on one’s first contact 
with some particular object, say a wine or 
tobacco, the only reaction he may have avail- 
able is an affective one. With this in mind, 
it was hypothesized that one of the things 
that determined preferred brands would be 
one’s affective response to them, and there- 
fore one’s usual brand would tend to be liked 
more than other brands. For any given sam- 
ple of objects, this does not exclude the pos- 
sibility that other objects may be liked even 
more, or that all of these objects may be 
equally liked. 

While this hypothesis might well have been 
wrong if any differential association between 
brands and affective judgment appeared, the 
use of affective judgments would still merit 
consideration as the possibility of a new way 
of approach to some common problems of 
discrimination. Indeed, the “method of im- 
pression” lies at the heart of psychophysics. 
However, the use to which it is put here is 
somewhat unusual since it is used as a “sensi- 
tivity” test rather than a simple preference 
test. Between the limiting cases of 100% 
“ike” and 100% “dislike,” judgments of this 
sort might be used where it is difficult or im- 
Possible to develop in Ss any other system of 
reporting their discriminations. 

Originally we had conceived of comparing 
the recognition and affective judgments for 
accuracy. This has proved possible only in 
the roughest way because of the logical prob- 
lems attending the definition of “accuracy” 
for the affective judgments. In a sense, affec- 
tive judgments cannot be veridical instru- 
ments; one may have an affective experience, 
but one does not have a correct affective ex- 
perience. But one can properly seek to cor- 
relate such experiences with known differ- 
ences like preferences or brands, So, when 
we speak of accuracy the reader may substi- 
tute “sensitivity” in the case of affective judg- 
ments without serious harm to the thread of 
the discussion. If this phase of the study is 
thought of as a methodological inquiry into 
different response procedures permitted Ss, 
then while accuracy may be asserted only of 
the recognition judgments, sensitivity may be 
asserted of the results of such judgments in 
terms of differential association with some 


Richard A. Littman and Horace M. Manning 


known differences, viz. brand preference and 
brand differences. In other words the meth- 
ods are being studied in terms of their sensi- 
tivity or accuracy, not the judgments them- 
selves. z 

Technical problems prevented the study’s 
being carried out with colas so that one can- 
not say anything about cola discriminatie™ 
on the basis of these results. But the 
of the argument should be generalizable 
discrimination studies in any modality, an 
that is our main objective. 


Procedure A 

$ . ni ve! 

Materials. Three brands of cigarettes, fke. 
used: Camel, Chesterfield, and Lucky 


a y SUL 
These were selected because a peal 7 
vey indicated that they accounted for abo 39 


per cent of the brands used by a sample of iip 
students at the University of Oregon. ed be 


Morris ran a close fourth, but was not us! 
cause of the difficulty in concealing iden 
print. It will be seen that the method 
vestigation is not seriously affected by this were 
sion. The students involved in the survey main 
not the same ones who participated in the 

study. „med 
Each cigarette was banded with a gu this 


tifying 
of in 
omis” 


label applied in a tacky state, While nd 
method, which was also used by Ra es 
Rachal, and Marks (5), undoubtedly te.so 


Some cues usually used by smokers, ae was 
round, so firm, so fully packed... > ? 
felt that the procedure of blindfolding pt 
more serious difficulties, Each S, ee 
smoked a cigarette whose upper half was S ci 
by a paper label, but whose lower half was 


sent 
TA 

efor! 

red 


ae 


a s noe 
Will all of you who are regular cigarette a indi; 
raise your hands? ,. . Will you file isi the 

that when one retur en ¥ 


ji whic’ of 
The experiment, “tes 0 


not unpleasant, should take about five miNU 
your time. Thank you very much.” . 9 
For reasons made clear under Routines 
these Ss were utilized in the analysis. gupS 
Routine. Ss were divided into two eq the ihe 
alternating assignments as they enter 4 in t 
perimental room. Each § was greete 
following manner: jeas 


“Come in. Just have a seat m ad! 
Now, I’m going to give you a cigaret 


< bY 


a 


Study of Cigarette Brand Discrimination 


like you to put it in your mouth, and when you're 

ready, TIl light it for you. (E hands cigarette 

to S.) Take three or four pufis—more if you 

like—just as you normally would, 

ne? then tell me whether you like or dislike 
(B) then tell me whether you think it is or is 

not the brand you usually smoke. 

. “Ready? . .”. Here’s your light.” (E lights 
Cigarette for S, waits till S indicates judgment is 
made, then takes cigarette and records response.) 

© you have a package of cigarettes with you? 
S en I see it? Is this the brand you usually 
“een (E records response.) “That’s all and 
ee you very much. Since the success of this 

Xperiment depends in part upon people not 

nowing what to expect when they enter, I’d ap- 
Peniata it if you didn’t discuss the procedure 

your friends.” 
antes Ss assigned to the affective group were 
raga the (A) portion of the instructions; the 
$ nition group was read the (B) statement. 

om Other respects treatment was identical. 
the pe âre four things especially to note about 
structions and the procedure. 
smoked did not know what brand of cigarette S 
left the A the experiment until the latter had 
Previoy 1 perimental room. The cigarettes had 
arbitrarily meet masked and placed together. r 
ave it to E ected one from a paper sack an 
Each § made only one judgment. Previous 
ave erred seriously in this respect. The 
available for analyzing situations of this 
ments “ably call for independence of measure- 
Is freque: € repetition of tests upon the same S 
Sibilit ntly unanalyzable because oi the impos 

0 ex computing any meaningful coefficien 
m opress the relationsiip between their judg- 
meth addition, it seems unlikely that the 
put his m ods of adapting S by having him wash 
reen outh between drinks or tasting a mint 
can be ts are entirely satisfactory, unless one 
the u ure that there is no cumulative effect of 
discrimina izer, Finally, if the materials to be 
be emon, ed are in fact discriminable, it must 
he « race ated in some independent way that 
cf e-reducer” does not have a differential 


Statistics 
Sort inya 


ect 
terae Rae the trace. If such a differential in- 
‘eDort actually exists, none of the previously 
Could pe Studies is designed so that its effect 
W 3. Ss poy evaluated as a source of error. 
Mi r her i their reaction by stating only 
Th ther it pi iked or disliked the cigarette or 
like, Tason as the brand they usually smoked. 
indy islike jud, this technique is as follows. A 
Won tent in ma is disjunctive; recognition 
on DOE Be: a form of a naming response 
ee , though it could be converted into 


rd a x 
PT to keep the response as uniform 


th Unctiye ;OWever, Ss were asked to make a 
hat they „identification. Tt should be noted 


Wer a, x äi 
€ not given any information con- 


187 


cerning the brands offered to them. For all they 
knew, they might have been getting “off brands.” 
The similarity of this portion of the procedure 
to that of Pronko and his colleagues is obvious. 
Not giving S a set of possibilities to draw on 
prevents the computation of a chance level of 
success; this will be more fully discussed in the 
Analysis. 

4. After S made his judgment he was requested 
to exhibit a pack of cigarettes, and indicate 
whether it was the brand he usually smoked. 
As a result of this step, 42 Ss were eliminated 
from the analysis; either they did not smoke one 
of the three brands selected for study, or they 
had no cigarettes with them. While some regu- 
lar smokers were undoubtedly excluded by this 
tactic, the likelihood of getting the occasional 
smoker seems small. As a matter of experi- 
mental procedure, it seems fair only to include 
individuals who may be expected to show the 
maximal degree of discrimination, i.e., the regu- 
lar smoker or drinker. Finally, the use of this 
device ensured complete obscurity of the test 
stimuli from the Ss; to have indicated prefer- 
ence for smokers of certain brands alone would 
have cancelled the precautions to make the rec- 
ognition judgment as comparable to the affective 
one as possible. 

Test Locus. For all but 15 Ss, the experiment 
was carried out in the same room. It was high- 
ceilinged, with a large window, a single ceiling 
fixture, two chairs and two tables. The room 
was thoroughly aired out before the next subject 
was admitted. All work was done during day- 
light hours. The remaining 15 Ss were studied 
in another room similarly furnished. Since they 
were distributed evenly (save for one) between 
the two groups, if there were any measurable ef- 
fect attributable to the difference between the 
rooms it could serve only to increase the vari- 
ance of the two distributions. It was decided, 
therefore, to leave this possible source of vari- 
ance in the error term of the statistic. 


Results 

In Table 1 the results for the recognition 
and like-dislike judgments are presented. In 
each cell, the top number represents the total 
number of “yes” or “like” responses, respec- 
tively, and the lower number, the total num- 
ber of observations in this category. Thus 
the bottom numbers represent the total of 
“yes” plus “no” responses, and of “like” plus 
“dislike” responses, respectively, in each cate- 
gory. As will be seen, the consigning of any 
observation to a particular category depends 
on three variables, viz. the brand of cigarette 
S regularly smoked, the brand he sampled, 
and which type of judgment he was asked to 
make. 


188 
Table 1 
Frequency of Judgments 
Recognition 

Brand Preference 

Ca Ch Ls 

Ca Yes 7 3 3 

zs Total 17 11 15 
2 
a 

E Ch Ye h g 2 

2 Total 16 10 16 
E 

Ga LS Yes 3 3 7 

Total 16 9 16 

Like-Dislike 

Brand Preference 

Ca Ch LS 

Ca Like 10 4 4 

z Total 14 9 16 

H Oh Tike 7 Š 5 

3 Total 7 10 14 
E 

S S ike 8 3 10 

Total 14 9 17 


It is readily seen that the design of this 
study permits an analysis of variance tech- 
nique. Therefore a three-way analysis was 
carried out. Independence of observations is 
assured by having only one judgment per S. 
The raw data are frequencies and are un- 
usable in that form. They were converted 
into a relatively normalized distribution by 
means of the arcsine transformation (2, 6). 
While there are variations in the number of 
observations per cell, comparison of the theo- 
retical vs. the obtained residual variances in- 
dicated little damage to the resulting analysis. 

Table 2 shows the data converted to pro- 
portions, and Table 3 summarizes the result- 
ing analysis of variance. A problem always 
arises concerning the proper error term in 
evaluating the significance of the various ef- 
fects. We have tested for AB, AC, and BC 
interaction by the F-ratio, with ABC inter- 
action as the denominator. AC and BC are 
not significant at the five per cent level, while 
AB seems high enough to warrant another 
test. Consequently we have pooled the AC 


Richard A. Littman and Horace M. Manning 


and BC interactions with the ABC interac- 
tion in order to obtain more degrees of E 
dom and a more powerful test. The F tes 
for AB interaction then becomes: 


. 7 = 5.320 
= 22.17, against" Fo.os (1, 8 df) = 59 


E ap- 
Since the AB interaction is significant, the ap 
propriate test for the 4-effect is: 
56.5 


š ” = 6.94, 
T36 = 0-415, against Foos (2, 4 df) = 6% 


while that for the B-effect is: 


42.5 _ 6,94: 
Tae = 0.312, against Faus (2,4 df) = 6 


: erge’ 
The following significant effects we 
AB interaction and C. What does this 
The interaction of brand preference 


g e no 
brand sampled, AB, results in a chang phat 
attributable to either variable alone- ften 


is, Ss did like their own brands more ° 
or identify them as their own more often ude 
they did other brands, One may Rea. 
from this that there exist differences in ds 
such that regular users of these bran 
distinguish among them. 


Table 2 


Proportions of Judgments 


Recognition 


Brand Preference 
n 


Ca Ch . 
.200 . 
BE Q 0.412 0.273 on 
Ẹ Ch 0.250 0.300 aT 
oe 0.187 9.333 
Like-Dislike 
Ls 
Ca Ch a 
v% Ca 0.714 0.445 0387 
ES Ch 0412 0.500 O88 
ac Ls 0.571 0.333 


— 


Study of Cigarette Brand Discrimination 


Table 3 


Analysis of Variance—Three Way 


189 


ated against one another: 


AB (like-dislike) 97 


Sum of Mean 
Squares df Square F-ratio 
A-effect 
(Brand Sampled) 113 2 56.5 
B-efiect 
Br: > z e 
a ane Preferred) 8 2 425 
(Recognition vs. 
oe Like-Dislike) S71 1 Sito 
AC 545 4 136.0  4.39* 
Bo 78 2 390 126} 
ABC 4 2 2.0 0.00657 
124 4 31.0 
Total 1,520 17 80.4 
PVs, Fon (4 am 
Vs. Fees 3 = 6S 


But what of the other hypothesis outlined, 
Z. that the affective judgment technique is 
one Sensitive or accurate than the use of 
Ognition judgments? In certain respects 
for findings are surprising. The proportions 
ite © two kinds of judgments are different, 
“ bi ratios of “like-dislike” and “my 
diffe á yes-no” responses are different. These 
ek ratios cannot be directly compared, 
of be to determine the relative accuracy 
© two kinds of judgments. In order to 
Sine it would be necessary to determine 
xia level of chance expectancy; this can- 
be done for the present design where Ss 


W H 
ere unaware of the possibilities available to 
them, 


vi 
m 


apowever, it is possible to re-analyze the 
cura, n order to estimate the relative ac- 
is aed of the two kinds of judgments. This 
each ne by constructing a two-way table for 
sults of the judgments separately. The re- 
that are shown in Table 4. It will be seen 
eith 4- and B-effects are not significant in 
<t Of the analyses. This is similar to the 
ine im the three-way table above, and 
n indicates no significant effect due either 
ips Sampled or brand preference alone. 
the yp & three-way analysis indicated that 
0 wa interaction was present. There is 
the ¢ Y of evaluating the AB interaction for 
°-Way analyses, but they can be evalu- 


aga: 
to 


AB (recognition) — 70.2 7 i 
against Fo.os (4, 4 df) = 6.39. 


We conclude that there is no difference be- 
tween AB interaction on like-dislike or recog- 
nition judgments, i.e., neither one is signifi- 
cantly superior as a means of distinguishing 
between brands. It should be pointed out 
that there is a tendency for the like-dislike 
judgments to be superior. With samples as 
large as these, however, it hardly seems large 
enough to be of practical significance, 


Discussion 


In the introduction, the use of a naming pro- 
cedure was taken to task. The essence of the 
argument, it will be recalled, was that even if 
two or more stimuli were appreciably different to 
S, he might not be able to express his awareness 
of this difference if the task demanded of him be 
one of applying names to the stimuli. To test 
this, Ss were presented with either a recognition 
or an affective assignment. Both seemed about 
equally accurate. 

Now it is certainly true that to name some- 
thing is to identify it. However, to identify 
something does not require that it be named. 
In other words, the process of recognition is 
broader than that of appellation. In either case, 
it is clear that there is some other process which 
we may call discrimination upon which recog- 


nition rests, logically if not temporally. The suc- 
Table 4 
Analysis of Variance—Two Way 
Sum of Mean 
Squares df Square F-ratio 
Recognition Judgments 
A-effect 21 2 10.5 0.149* 
B-effect 61 2 30.5 0.434" 
AB interaction 281 4 70.2 
Total 363 8 45.4 
Like-Dislike Judgments 
A-effect 170 2 85.0 0.876 
B-effect 28 2 14.0 0.144 
AB interaction 388 4 97.0 
Total 586 8 73.6 


* Tested against AB interaction for recognition, 
{ Tested against AB interaction for like-dislike. 


190 Richard A. Littman and Horace M. Manning 


cuss of the affective judgment attests to this. 
There is, to be sure, the likelihood that distinc- 
tions between test objects which are based upon 
affective judgments involve entirely different 
kinds of discriminations from those based upon 
recognition. The fact that the Proportions of the 
two kinds of judgments differed indicates that we 
did not have a single process. The tendency for 
like-dislike to be superior is further support along 
these lines, though the difference is obviously of 
theoretical rather than practical import, being of 
such a small magnitude. It is further likely that 
different kinds of recognition, e.g., naming or 
“sorting,” involve different Processes in part. 
After all, to name something properly requires a 
greater amount of precision in using the various 
cues provided by the stimulating object or event. 
But it is obvious that a sound investigation in 
an area of this sort requires that such distinc- 
tions be kept in mind. 

The study seems to have demonstrated, also, 
that one need not be restricted to “cognitive” re- 
actions to objects in order to test for discrimina- 
tion of an object’s properties. 
tive reactions tap different Properties; but in the 
field of discrimination where in field situations 


Identification, recognition, etc. may have 
an affective base as well as the usually assumed 


sible for the affective judgment. 
a substantial correlation between the two types 
of judgments can be d 
most likely), one may have a rapid techni 
for determining an S's sensitivit 
where this approach is applicable. 
There remain, as always, many questions con- 
cerning the type of judgmental situation that 
should be used. The answer to such questions, 
we believe, can best be formulated after the ob- 
jective of the investigator has been stated as 
precisely as possible, At such a time the par- 
ticular variables to be manipulated should emerge 
more clearly. In any case, it is obviously de- 
sirable that a person be capable of making a 
naming reaction before such a naming or other 
differential reaction is used to decide whether a 
discrimination is possible. Similar errors can be 
avoided if one thinks of discrimination as being 
partly a function of materials, partly of pro- 
cedure, and if one thinks of it, further, as exist- 
ing in various degrees of precision or exactness, 


Summary 


It was proposed that the use of a recog- 
nition judgment, as employed in previous 
Studies of cigarette and cola brand discrimi- 


nation, is not the most appropriate test of 
discriminability. Therefore, in the presen 
study the use of a recognition judgment an 
an affective (like-dislike) judgment were we 
pared. A total of 246 regular ane 
smokers were divided by alternation A 
two groups. Members of one group Hae 
recognition judgment, the other a likenis 
judgment. Each S was given one of the T 
Three” cigarettes with brand name abori a 

Results, analysed by analysis of Loa, 
using the arcsine transformation, were as 
lows: 


ith 

1. Both types of judgment were made witl 
better than chance accuracy. «ie Was 

2. The like-dislike judgment technique tn 
slightly more sensitive than the recogni 
judgment, but not significantly so. A 

3. The distribution of responses (di 
mous in both cases) for each type of 
ment was radically different, suggesting 4 
while about equally sensitive as appliet o 
the present problem, each is an expressio 
a different type of psychological function- 

4. It was suggested that the use of P i- 
fective judgment may have a greater an it 
cability to problems of discrimination enp hi 
has enjoyed, and merits further study 
respect, 


choto- 
judg- 
that, 


Received July 7, 1953. 


References 


xpe" 
1. Husband, R. W, and Godfrey, J. An d 
mental study of cigarette identifica 
appl. Psychol., 1934, 18, 220-223. of E 
2. Mood, A. F, Introduction to the theory P. 
tistics. New York: McGraw-Hill, 19 
346. Jdentif 5 
3. Pronko, N. H. and Bowles, J. W., Jr- » parle 
cation of cola beverages. (In ae 7 (2) 
J. appl. Psychol. (1) 1948, 32, 304~ 5-608 
1948, 32, 559-562; and (3) 1949, ae catio, 
4. Pronko, N. H. and Herman, D. T. Jaon , apb" 
of cola beverages. IV: Postscript- it- 
Psychol., 1950, 34, 68-69. ican) PE 
5. Prothro, E, T, Identification of Americi i oe 
ish, and Lebanese cigarettes. J. ah R 
chol., 1953, 37, 54-56. rks, M: 5 
6. Ramond, C, K., Rachal, L. H., and Mee moke” 
Brand discrimination among cigarette rat 
J. appl. Psychol., 1950, 34, 282-254- es; ovr 
re Snedecor, G.W. Statistical methods. AME 38 
Iowa State University Press, 1940. 
383. 


THE JOURNAL or Appiir $ 
NAL é LIED PsycHoLocy 
Vol. 38, No. 3, 1954 Pee 


Pointing Accuracy of a Joy Stick Without Visual Feedback 


W. D. Garvey and W. B. Knowles! 


Naval Research Laboratory, Washington, D. C. 


The following problem was suggested by a 
"i pin Situation in which it was necessary 
o know how accurately an operator could 
Se a Joy stick at a target without visual 
ng to the position of the stick. The 
keia as it was investigated may be de- 
„ed as determining man’s ability to point a 
JOY stick at a series of small points of light 


‘splayed in the space of an otherwise totally 
ark room, 


Procedure 


p bbaratus, A general pictorial representation 
joy sti ppparatus is presented in Figure 1. The 
iame S Was an aluminum rod, one-half inch in 

chen, 6.5 inches long, mounted on the shafts 
Pointed potentiometers. When the stick was 
com at a target, the horizontal and vertical 

Ponents of the stick’s position were con- 


vert d . . 

nge aD the potentiometers into voltage read- 

of de Which were calibrated in terms of degrees 
Viation in azimuth and elevation. 


© Joy stick was located on a pedestal in the 
ce 
vend ai the dark room 68 inches from the for- 
i, Wall, above the floor, and below the ceil- 
mediat ly Ss sat in a chair on a platform im- 
Stick ae behind the pedestal so that the joy 
structed at about stomach level; they were in- 
balm | t° grasp the stick in the right hand, 
top of P and thumb extended forward along the 

Of the Stick, 
cated gh 8ets were 24 stationary small lights lo- 
ince t Predetermined positions about the room. 
Ness of N€ room was totally dark and the bright- 
detect the targets was very low, Ss were able to 
tive the target with little knowledge of the rela- 
ardeg tance involved. The targets may be re- 
Sbecified ying on the surface of a sphere of un- 
fated 4 diameter, the center of which was lo- 
teg tee joy stick. When the joy stick was 
eto directly ahead of S it indicated the zero- 
Sions omt for the azimuth and elevation dimen- 
Pitura. the targets, This center point (C in 
Nate be Was denoted by a red cross (illumi- 
at the octWeen trials only) which was mounted 
the lo es of the forward wall, 68 inches from 
“ated ay The positions of the targets were lo- 
"terms of degrees of azimuth and eleva- 


> authors wish to acknowledge the valuable 
a Son f Mr. Manus Munger and Mr. Gerald 
ion Ie Srmerly at this Laboratory, who assisted 
Xperiment and carried out the major por- 
ata analysis, 


tion from point C. These positions are given in 
Table 1. The schema of these positions may be 
interpreted with the aid of Figure 1; e.g., target 
No. 9 (labeled as such in Figure 1) was located 
16° below and 16° to the right of point C. The 
position of each of the other targets may be 
similarly interpreted. 

Method. The Ss were seven Naval enlisted 
men stationed at the Laboratory to serve as sub- 
jects; all Ss were right-handed. Without previ- 
ous dark adaptations Ss would have had better 
visual acuity at the end of an experimental trial 
than at the beginning. Since it was desired to 
maintain constant the visual component of the 


Table 1 
Target Positions and Response Errors 
N= 28 
Target Position* Response Errors 
(Degrees) (Degrees) 
Target Eleva- Azi- 

No. tion muth Mean S.D. 
1 —74 +56 10.3 7.2 
2 —74 =45 10.4 6.7 
3 —46 +79 16.0 9.5 
4 —46 +16 13.9 Ti 
5 —46 =ü 15.0 7.8 
6 —46 —74 15.7 7.6 
7 —14 +76 12.8 5.8 
8 =15 +44 11.8 73 
9 —16 +16 8.8 4.6 

10 —16 —14 9.9 6.0 

11 —14 —45 8.7 4.9 

12 —14 —74 10.7 ee 

13 +16 +75 16.9 10.3 
14 +15 +45 11.9 5.7 
15 +15 +15 12.3 8.2 
16 +15 =15 10.9 75 
17 +15 —42 12.4 6.0 
18 +15 -73 13.9 7.3 
19 +47 +75 20.5 10.0 

20 +47 +15 13.6 8.0 

21 +47 =15 MA 6.6 

22 +47 —74 17.9 11.3 

23 +75 +45 18.2 9.4 

24 +73 —40 17.1 78 


* A plus sign indicates upward elevation and right 
azimuth; a minus sign indicates downward elevation 
and left azimuth. 


192 


task, Ss were given 20 minutes of dark adapta- 
tion before an experimental session. 

The task was considered to be a relatively un- 
familiar one for Ss. Therefore, before the ex- 
periment proper, each of the seven Ss was given 
two practice periods (one per day). This prac- 
tice amounted to four trials of pointing the stick 
at each of the targets. 

The experiment proper began on the day fol- 
lowing practice. The Ss were given four experi- 
mental trials, two per day for two successive 
days. An experimental trial consisted of one 
presentation of each of the 24 targets. The tar- 
gets were presented to Ss in a different random- 
ized order éach trial. 

The S was given as much time as was needed 
to make the pointing response, and was in- 
structed to report when he considered the stick 
to be pointing at the target. The Æ immediately 
recorded the position of the stick, extinguished 
the target light, illuminated the center cross, and 
then instructed § to return the stick to the 
center position. The next aiming response was 
initiated by having $ move the stick from this 
centered position to a position of pointing at a 
new target. Thus each aiming response began 


W. D. Garvey and W. B. Knowles 


and ended by having the stick pointed at the 
center cross. din 

The S’s pointing responses were measure He 
terms of degrees of elevation and azimuth il 
tion of the stick’s pointing position from the pE 
get’s position. This elevation and azimuth devm 
tion was later transformed into a great pes 
deviation, which was a measure of the dira 
angular displacement between the pointing Po 
sition of the stick and the position of the er 

The Ss were never given any knowledge © S 
correctness of their responses or the direction 
their errors. 


Results fot 
Magnitude of Errors. The mean error i 
all Ss and all targets was 13.4°, with a mana 
for single stimuli from 1° to 52°. Table a 
presents the magnitude of errors to Epe 
targets and the respective standard dene 
Mean errors with respect to location are rat 
sented in Figure 2, The data indicate ~ 
response errors to some targets were are at 
than to others. The smallest errors aPP 


Fic. 1. Schematic representation of ex 
9 = position of target N 


0. 9; S and hand stick are located in 


‘perimental situation, C = center point; 


center of room. 


Pointing Accuracy of Joy Stick Without Visual Feedback 


Fic. 2. Diagrammatic representation of magnitude and direction of 
Mean errors, in degrees of arc deviation from target, 
are presented for each target relative to center point (C); significant 
directional error deviation is indicated by direction of pointed arrow. 


response errors. 


to have b 
Center 
all 


een made to targets just below the 
of the room (i.e., targets 7 through 12, 
@pproximately 15° below the center) and 
and > extremely low targets (ie., targets 1 
enera ae: 75° below the center). 
a RA Y speaking, the largest errors were 
and dhe targets located on the extreme right 
largest ove the center of the room; next 
Gime were made to targets on the 
Sis? indi left and above. Statistical analy- 
boye ted that response errors to targets 
cantly he center of the room were signifi- 
th ceter (p < .05) than those below 
sity fo er; in addition, there was a propen- 
r right response errors to be greater 
E response errors (p < -10). 
for Ss to a of Errors. There was a tendency 
cife tay err in a particular direction for spe- 
argets. Statistical sign tests” were 
2D e direction of Ss’ response errors. 
ie tet? W. J. and Mood, A. M. The statistical 


ee Robie g” statist. Ass., 1946, 41, 557-566. 


The arrows in Figure 2 give an indication of 
the direction of the errors made to a specific 
target; for example, the upward pointing 
arrow at target No. 4 indicates that a sta- 
tistically significant (p < .05) number of Ss’ 
responses were in the direction of the stick 
pointing above the target. The right-left 
arrows may be similarly interpreted. There 
were two directional tendencies in elevation 
errors. For responses made to targets below 
the center and 15° above the center there was 
a tendency for Ss to aim above the targets. 
However, for those targets 45° or more above 
the center there was a tendency for Ss to aim 
below the targets. There also appear to be 
two directional tendencies in azimuth errors. 
In general, Ss tend to aim to the left of tar- 
gets located on the extreme-right and to the 
right of targets on the extreme left; i.e., Ss 
are disposed to undershoot the targets on the 
extremes. However, there is also a tendency 
for Ss to aim to the left of targets located 
just to the left of the center of the room and 


194 


to aim to the right of targets located just to 
the right of the center of the room. For these 
targets located around the center, Ss appear 
to overshoot the targets. 


Discussion 


The results indicate that more than 90% 
of Ss’ pointing responses were within 25° of 
the targets. In most applied situations, where 
joy stick control is feasible, such accuracy is 
adequate. It is clear, however, that pointing 
accuracy is partially determined by the locus 
of the targets. The fact that Ss were re- 
quired to hold the stick in a particular fashion 
is certainly an influential factor in determin- 
ing the differential pointing accuracy for the 
various targets. Pointing at targets located 
on the extreme right and upward required a 
more “strained” muscular response on the 
part of Ss with the result that responses to 
targets located in these areas would be more 
difficult responses to make, From interviews 
with Ss after the experiment it was learned 
that even though S could -not see the joy 
stick or his arm, he attempted to point the 
stick at the targets as if he were sighting 
down the right arm; i.e., S was pointing the 
entire arm as well as the stick at the targets. 
Such aiming was not possible with all targets, 
but it was Ss’ belief that when such aiming 
was possible, they were able to respond with 
more ease and accuracy. Such a mechanism 
operating in the pointing procedure would 
have facilitated responses made to targets to 
the left and downward. The data imply that 
these two factors, manner of hand grip and 
aiming as if with the entire right arm, may 


W. D. Garvey and W. B. Knowles 


have influenced pointing accuracy, for “ 
sponses towards targets which were con 
ward and to the left were more accurate tha! 
those upward and to the right. hat 

Although statistical analysis indicates t : 
no improvement in performance took Lt 
during the course of the experimental | 
the fact that Ss respond with consistent ermi 
biases would indicate that Ss may learn yl- 
point more accurately if they are given eny 
edge of their results during the course 
practice. 


Summary 


ine 
An experiment was conducted to eee 
how accurately man can point a joy sig o 
visual targets without visual feedback F i 
the position of the stick. The results 0 E 
experiment may be summarized as follow 


4° 

1. Pointing errors ranged from 1° a 
with a mean of 13.4°, Ninety per ogni 
the pointing errors were 25° or less in M4? 
tude. 

2. There was a correspondence 
magnitude of errors and locus of t ons 
target. Errors were largest for eR and 
made to targets located to the righ yas 4 
above the center of the room. There Me i 
tendency for responses to be more accu"? left 
they were made to targets either tO 
or below the center of the room. 

3. There was a tendency for Ss tO 
shoot targets located around the peri 
the target space and to overshoot targ 
cated around the center of the space: 


betwee? 
isu 

visu! 
he s 


under 

0 
Y, 

he 1 % 


Received July 16, 1953. 


THE Journat 
38, k 


OF APPLIED PsycHoLoGY 
Vol. 38, X 


1954 


Rate Accuracy in Handwheel Cranking * 


Robert S. Lincoln 
The Johns Hopkins University 


When a rate of movement is produced by a 
uman operator in an attempt to reduce error 
n tracking performance, accuracy of control 
may be limited by the inability of the opera- 
tor to maintain a steady speed. His rate of 
movement may not continuously match the 
required rate. Because of this limitation it 
fcomes important to determine the accuracy 
th which various rates of movement can be 
maintained. 
In a handwheel cranking task it is possible 
introduce variation in the required rate of 
movement in three ways. 1. The required 
oe Speed of rotation may be increased 
Wh &creased for a given radius of movement. 
als en this is done, the required linear rate 
i ay creases or decreases. Linear and angu- 
ar rates, therefore, change in combination. 
‘Ses rate in this case refers to the units of 
handwh, traveled per unit of time by the 
may h eel knob. 2. The required linear rate 
rate Pe changed while holding the angular 
ing er This is accomplished by vary- 
gular « radius of movement for any given an- 
e ot 3. The required angular rate may 
anged while the linear rate is held at 
i it value. This is accomplished by 
radi 8 compensating adjustments in the 
mS Of movement as angular speed changes. 
angal, effects of variation in both linear and 
With tates of movement have been studied 
ing tee to performance in manual track- 
curae, o, Melson (3) has measured the ac- 
Ment ae tained with various rates of move- 
Pensators different radii of cranking in com- 
e mm tracking. Lincoln and Smith (4) 
ated the angular and linear rates in 
e te in a direct-pursuit tracking task. 
,. dtionship between rate of movement 
Nature apa is complicated, however, by the 
* Thi the tracking devices. 
16 1S work TEN 
jeten k Order fr satuen the Oils of Naval Re 
The Johns Hopkins University. This is 
i 166-1-178, Project Designation No. NR 
Live that contract. Miss Frances Wolf- 
ne collection and analysis of the data. 


ir 


to 


Comb; 


ra ; 
am aide 


195 


The accuracy of direct-pursuit tracking de- 
pends upon the combined accuracy of both 
rate and positioning movements (4). This 
results from the fact that accurate tracking 
is achieved only when the operator matches 
the position of the target with a specific po- 
sition of the handwheel knob and simultane- 
ously matches the rate of target movement 
with a proportional rate of handwheel motion. 

Consistent relationships between the posi- 
tion of the target and the position of the 
handwheel knob are eliminated with com- 
pensatory tracking devices, but in both com- 
pensatory and pursuit tasks it is possible to 
achieve the required rate of movement with- 
out maintaining the alignment of cursor and 
target. For this reason tracking error rec- 
ords do not give an accurate picture of the 
operator’s ability to maintain a steady rate 
of speed. Furthermore, in both of the studies 
described, changes in the required angular 
rate of cranking were produced by changes in 
gear ratios. This procedure reduces the load 
on the handwheel as rate increases and may 
affect the relationship between speed and ac- 
curacy. 


Purpose of this Experiment 


This report is concerned with the accuracy 
with which different linear and angular rates 
of cranking movements can be maintained for 
clockwise and counterclockwise directions of 
turning. 

In order to eliminate the difficulties inher- 
ent in tracking devices and to study rates of 
movement in greater isolation, a special task 
has been devised. Four characteristics of the 
task are of particular importance. 1. The 
subject is presented with a display consisting 
of a target and cursor. The cursor instan- 
taneously indicates the rate of handwheel 
movement by its spatial position relative to 
the target. It is impossible for the subject to 
achieve the required rate of movement with- 
out aligning the target and cursor. 


i Errors 
are recorded as errors in rate, 


2. With the 


196 


Robert S. Lincoln 


Fic. 1. Schematic diagram of the apparatus: (1) accuracy-recording clock; 
(2) electric counters; (3 display galvanometer; (4) screen; (5) crank; (6) 
tachometer generator; (7) light source for revolutions counter; (8) photo- 
electric cell; (9) light source; (10) recording galvanometer; (11) variable re- 
sistor; (12) light source; (13) photoelectric cell. 


apparatus used there is no relationship be- 
tween the angular position of the handwheel 
knob and the linear position of the cursor. 
The position of the cursor is solely dependent 
upon the rate of cranking. 3. The target re- 
mains stationary. 4. No changes in hand- 
wheel load occur with changes in the required 
angular rate of cranking. The “feel” of the 
handwheel does change when linear rate is 
changed for a constant angular speed. This 
results from the greater leverage of the larger 
handwheels. 
Apparatus + 

Figure 1 is a schematic diagram of the ap- 
paratus. The subject turns the crank with 
his right hand. The center of rotation is 81 
cm. above the floor and the subject is seated 
with his right arm directly in line with the 
crank shaft. Friction in the system is kept 
at a low value by the use of ball bearings, 
and there is little inertia in the handwheel. 

Attached to the end of the crank shaft is a 
small pulley that drives a second pulley 
mounted on the shaft of a tachometer genera- 

1 The apparatus was constructed by Mr. Ervin G. 


Smith, Jr. of the Engineering Laboratory, Institute 


for Cooperative Research, The Johns Hopkins Uni- 
versity. 


sao EG 
tor. The output of the generator drives ies 
mirror galvanometers that are wired 1" ouree? 
with the generator. Separate light 5° ich 
are focused on each of the mirrors. fected 
of light 1 cm. high and 3 mm. wide is "° f a 
from one of the mirrors to the bac% p to 


. A art, “ 
Two vertical black lines, 3 mm. 4P me 


er 
z lines 
drawn on the screen. These black 1 pte 
as the target. The subject’s task 1S tw 
the reflected patch of light betwee? “| att 


lines by turning the crank at a gane u” 
Variation in the required angular rate jom? a 
ing is achieved by adjusting a pote?” ject! 
that controls the resistance in got TA 
between generator and galvanomete? ‘hi 
greater the resistance in the circuit, n 5 
the rate of cranking required tO ce et jior 
light-spot (cursor) between the A E 

A circular spot of light reflected p” 
second galvanometer to the surface ° urc? g 
toelectric cell provides the main Forrof : 
error indication in the apparatus. pot ag 
erance is determined by the angular cot 
of the photocell relative to the 


Rate Accuracy in Handwheel Cranking 


galvanometer. When the subject turns the 
handwheel at the required rate, plus or minus 
the chosen error tolerance, the light from the 
recording galvanometer strikes the photocell. 
This activates an electric clock that is read 
to the nearest .01 of a second. Another clock 
times the trial period and shuts off the ap- 
paratus at any selected trial-length. The 
timing and recording clocks do not begin to 
operate until the subject first reaches the re- 
ae ne of turning. Because of these fea- 
the ch y number of seconds accumulated on 
total i of the recording clock indicates the 
subie “os. during a trial period, in which the 
eat an maintains a rate of turning within 
ablished tolerance limits. 
sn otter indications of performance are 
indicat simultaneously. One electric counter 
period t the number of times during a trial 
within a the subject’s rate of turning falls 
of the me tolerance limits. This is a measure 
a trial requency of oscillation in rate within 
numbe A second electric counter counts the 
x hg of revolutions actually turned during 
is dhie At low speeds accuracy in counting 
Gein by counting to the nearest .2 of a 
count on. At the higher speeds only one 
Podee revolution is possible. Both the 
in S g clock and the counters are enclosed 
oundproofed boxes. 
y Ne in the radius of turning is achieved 
Arione die the knob to the handwheel at 
caine istances from the center of rotation. 
ea er weight is also provided. ; 
ie = recording for counterclockwise turn- 
ei cee by reversing the tachome- 
oes o a end-for-end since the generator 
itection Produce reliable voltages when the 
in which the shaft turns is reversed. 


Procedure 


Obits 

a right-handed male subjects were used 
the iag aent Fifteen subjects turned 
Îfteen ao in the clockwise direction, and 
Wheel in ifferent subjects turned the hand- 
the Counterclockwise direction. All 

e ave five consecutive trials, 30 
a tanki length, at each of five angular rates 
“rank ra ae combined with each of three 
lis, id li, The rates used were 25, 75, 125, 
frank ta a revolutions per minute. The 
Ml were 2.5, 7.5, and 12.5 cm. 


jects 


197 


The orders in which the subjects cranked 
under the various conditions were determined 
by a 15 X 15 latin square. The same square 
was used for both directions of cranking. 

Subjects were instructed to keep the cursor- 
light centered between the two target lines at 
all times. The apparatus was so arranged 
that the cursor moved to the right as the 
speed of turning increased from zero velocity 
for clockwise turning. For counterclockwise 
cranking the cursor moved to the left as speed 
increased. The range of error tolerance was 
kept at + 9% of the required rate of turning. 
Subjects were not aware that there was any 
tolerance in the scoring system. 


Results 


Effects of Linear and Angular Rates. In 
this experiment linear rate was varied inde- 
pendently of angular rate by changing the 
radius of movement for various constant an- 
gular speeds. When the angular speed of 
movement is constant, rate-accuracy increases 
with increased linear rate in the lower range 
of handwheel speeds. At the higher hand- 
wheel speeds this relationship is reversed, and 
accuracy decreases as linear rate increases. 

Figure 2 pictures these results which are 
similar to those obtained by Helson with a 
compensatory tracking task (3). Helson, 
however, did not distinguish between angular 
and linear rates of movement. In Figure 2 
linear rate increases with increased handwheel 
radius for any one handwheel speed, but the 
actual linear rates are not equal for the same 
radius at different handwheel speeds. 

The significance of the differences between 
handwheel speeds, radii, and the interaction 
between these two variables was tested by the 
non-parametric analysis of variance described 
by Friedman (2) and Wilcoxon (6). This 
procedure was necessary because the data 
exhibited heterogeneity of variance when sub- 
jected to Bartlett’s test (1). For both direc- 
tions of turning the two main variables and 
the interaction between them were significant 
sources of variation (p < .01). The results 
for the two directions of movement are com- 
bined in Figure 2 because they show similar 
tendencies. 

Figure 3 is a graph of the relationship be- 
tween accuracy and angular rate when linear 


198 


ACCURACY (SECONDS) 


25 75 125 175 225 
REQUIRED HANDWHEEL SPEED 
á (RPM.) 
Fic. 2. Accuracy as a function of handwheel 


speed for different crank radii. The ordinate values 
indicate the mean time that the rates were main- 
tained within specified tolerance limits. A score of 
thirty is the maximum Possible accuracy score. 


speed is constant. The values in the figure 
were obtained by interpolation between points 
on Figure 2 and calculation of the actual 
linear rates. Linear rate may be held con- 
stant while angular rate changes by adjusting 
the radius of the cranking movements, 

Figure 3 shows that, for various constant 
linear rates, rate-accuracy increases with an- 
gular speed up to about 175 rpm. For the 
lower linear rates it appears that slightly 
greater accuracy might be obtained at speeds 
beyond 175 rpm. It would be necessary 
to extrapolate values in order to extend all 
curves over the entire Tange of angular speeds, 

There is one factor which does not remain 
constant when linear speed is held at a fixed 
value by adjusting handwheel size as angular 
speed increases. Different muscles become 
involved in the control of the handwheel as 
size is changed. This factor is of relatively 
little importance for the handwheels used in 
this study, 

It might be expected that increased accu- 
Tacy would result from increased angular or 
linear rates of movement since, as Helson has 
pointed out (3), the absolute sensitivity of 
the handwheel decreases as rate increases, 


Robert S. Lincoln 


For example, with a + 9% error toleran 
the range of permissible error is about E. E 
rpm at a rate of 25 rpm, and =n 
at a rate of 225 rpm. Inspection of P 
2, however, indicates that at the higher 4 ; 
wheel speeds little advantage is taken 0 adi 
decreased sensitivity. For the larger E 
accuracy actually drops off with hanai K 
speeds greater than 75-125 rpm. T ita 
fect cannot be related to the physical ae 
tions of the subjects. Inspection of the 
ords obtained with the revolutions co 
showed that, with 11 minor exceptions ming 
trials, all subjects were capable of cra! rate 
with all radii at an average angula 
greater than 175 rpm when attempti 
achieve the rate of 225 rpm. 

The data concerning the frequency 4s 
lations in rate within a trial suggest t ; 
oscillatory nature of cranking mo ene ; 
places a limit on the rate-accuracy er in 
These frequencies are shown in Pa m 
which the data for the two directions © 
ing are again combined. 

A single oscillation in rate, as aah 
this experiment, includes both the ch@ 


unter 


f oscil- 
at the 
men! 


in 
ed 
ree a 


T 
Q 
z 
© 
© 
W 
2 
> 
G 
q 
a 
=) 
© 
S 
<q 
25 75 125 175 
EED 

REQUIRED HANDWHEEL SP 

(RPM.) yhe” 

ion of bandu 

Fig. 3. Accuracy as a function he Ype 


speed for different constant linear rates. K: 
on the curves were obtained by inte jon 
tween points in Figure 2 and calcula 
linear rates. 


tio? 
la of 


Rate Accuracy in Handwheel Cranking 


Tate greater than the error tolerance and the 
return to a rate within the error tolerance. 
From Figure 4 it is apparent that the num- 
ber of oscillations increases with both linear 
and angular rate in spite of the reduction in 
Sensitivity. For the two larger radii the num- 
ber of oscillations increases at an increasing 
rate as the required handwheel speed is raised. 
In Contrast, the number of oscillations for the 
ie radius increases at a much slower 
tieng parametric analysis of variance estab- 
huge e significance of the over-all effect of 
a w eel speeds and radii upon the number 
oscillations per minute (p < .001). 

fede te Shows the durations of the mean 
ania en in seconds for the various 
plotted eel speeds and radii. ‘The durations 
ES in the figure were obtained from Fig- 
point „and 4. The accuracy scores for each 
ji i Figure 2 were first subtracted from 

3 aximum possible score of 30 seconds. 
sagas ating values indicated the time spent 
trial, ing the error tolerance in an average 
number p scores were then divided by the 
Driate of oscillations in rate for the appro- 
tained eae shown in Figure 4. The ob- 
řate-osci]] ues were the mean durations of the 
ations in seconds. Figure 5 shows 


280k Radii 


4——a 25 cm. 
= 75 cm. 
O-==-0 12.5 cm. 


240 


200 


'6o 


120 


NUMBER OF OSCILLATIONS PER MINUTE 


80 


2! 
s 75 125 175 225 


REQUIRED HANOWHEEL SPEED 
(RPM) 

h, ney of rate-oscillations as a func- 

ecl speed for different crank radii. A 

on includes both the change in rate 


thi tolerance limits and the return to a 
tolerance limits. 


Freque 
hee handwhy 


Oscillati, 


199 


— 50 

ao 

z 

S RADII 
a—a 2.5 cm. 

a 40 e——e 7.5 cm 
©----o 12.5 cm, 

o 

z 

° 

= 30 

< 

a 

| 

rs) 

D 

© 20 

w 

° 

z 

G 10 

i 

a 

a 

=) 

a 


25 75 125 175 


225 


REQUIRED HANOWHEEL SPEED 
(RPM) 


Fic. 5. Durations of mean rate-oscillations. 


that the duration of the mean oscillations in 
rate is a decreasing function of handwheel 
speed. 

Considered together, Figures 4 and 5 show 
why accuracy in Figure 2 does not continue 
to increase above certain speeds as the sensi- 
tivity of the handwheel decreases. At the 
lower handwheel speeds the number of rate- 
oscillations increases slowly as angular and 
linear speeds increase in combination, but the 
durations of the oscillations decrease rapidly. 
More errors are made, but they are elimi- 
nated much more quickly. The result is in- 
creased accuracy with increased speed. In 
the middle range of handwheel speeds the 
durations of the oscillations still decrease as 
speed increases, but at a much slower rate. 
At the same time, however, the number of 
oscillations is increasing rapidly. The result 
is a levelling-off and even a decrease in ac- 
curacy. 

This interpretation also accounts for the 
relative accuracy obtained with different radii 
of movement. At the low handwheel speeds, 
for example, a greater number of oscillations 
appear with the larger handwheels. However, 
the durations of the oscillations are shorter 
for the larger handwheels and increased ac- 
curacy results. 

Figure 5 provides suggestions concerning 
the nature of the responses involved in main- 


200 


RADII 
oa 2.5 cm. 
e——e 7.5 cm. 
4 O---9 12.5 cm. 


N 
NUMBER OF OSCILLATIONS PER REVOLUTIO 


9 25 75 125 175 225 
REQUIRED HANDWHEEL SPEED 
(RPM.) 


Fic. 6. Number 


6 of rate-oscillations per handwheel 
revolution, 


Robert S. Lincoln 


Effects of Direction of Cranking. on. 
the accuracy achieved in clockwise we 
was slightly higher than for countercloc Mie 
turning on four of the five speeds, res by 
ferences were not significant when teste! pii- 
the necessary non-parametric methods. | eal 
ferences between directions were establis nce 
however, in another measure of performa 
—the constant rate error. errors 

Figures 7 and 8 show the constant rpm: 
in the average rate of cranking 1 Fig 
Figure 7 is for clockwise turning, vil 
ure 8 is a plot of the data for ca 0 
Wise turning. For both directions the sidere 
the constant errors indicates that, cons) ka 
as groups, the subjects tended to eye 2 
rates that were slower than the rps a 
although they were capable of epee rpm 
higher rates, Beyond the speed of 17 sud° 
this statement does not hold since ot rpm 
jects could not maintain the rate of 22 Jock 

The constant errors for the sais $ 
wise direction are significantly greate ad- 
for the clockwise direction (p < .01). nt in 
dition, constant error shows a significa” in 
crease (> < .01) jin size as. linear ca oth 
creases with constant angular rates fo clock 
directions of cranking, For the ai 2 
wise direction only, constant error $ as ah 
Significant increase (p < .001) in size © ion, 
gular and linear rates increase in com vs p 
The speed of 225 rpm was not inclus ities 
the calculation of the latter two proba 


CONSTANT ERROR 


25 75 


125 175 


g0 
REQUIRED HANDWHEEL SPE 


(RPM.) 


n per 
Constant error in revolutions F 
ise turning, 


e 
ja 
Fic: 9. 
for clockw: 


Rate Accuracy in Handwheel Cranking 


“18 RADII 


CONSTANT ERROR 


i—a 2.5 cm 
-22 e—=—e 7.5 cm. 
©-=--0 12.5 cm. 


2 
5 75 125 175 225 
REQUIRED HANDWHEEL SPEED 
(RPM.) 


Fic, 

. 8, : 

°F count, Constant error in revolutions per minute 
erclockwise turning. 


beca 
Oy aa TS error at that speed is influenced 
FER limitations of the subjects. | 
ess in rat s do tend to ignore their unsteadi- 
e tar a and center their oscillations about 
towar Bet, they do so with a persistent bias 
Wired con that are slower than the re- 
U the ¢ e. The bias is greater for cranking 

Ounterclockwise direction. 


Subj Summary 
‘an cranked a handwheel at each of 
three die speeds combined with each of 
ects oe tne radii. Fifteen sub- 
een Pa ked in the clockwise direction while 
ckwį ler subjects cranked in the counter- 
a direction. 
Stantan, Subjects were provide 
cranking. S visual indication 0 
Gotten: In appearance the task rest 
Paratys onal tracking problem. With the 
Uced t used, however, the task was Te- 
"ates to the maintenance of the required 
the on positional relationships between 
Te elir indicator and the handwheel knob 
minated, and it was impossible for the 


Clo 
d with an in- 
f their rate of 
resemble 


201 


subjects to achieve the required rate without 
aligning the indicator and target. In addi- 
ia =e hange in handwheel load were in- 
roduce changes in i 

E y gi the required speed of 

At the lower handwheel speeds, rate-accu- 
racy improved with increases in the linear 
rate of movement for a constant angular rate 
At the higher angular speeds an inverse rela: 
tionship appeared between linear rate and ac- 
curacy. Linear rate refers to the units of 
distance traveled per unit of time by the 
handwheel knob. Linear rate was varied in- 
dependently of angular rate by changing the 
radius of the movement. 

For constant linear rates accuracy always 
improved with increased angular rate up to 
about 175 rpm. The failure of accuracy to 
continue to improve above a certain point 
when linear and angular rates were increased 
in combination was attributed to the oscil- 
latory nature of cranking movements. 

Subjects tended to crank at rates slower 
than the required rate although they were 
capable of maintaining the required rate. 
This tendency increased as both linear and 
angular rate increased. No significant dif- 
ferences in accuracy appeared between the 
two directions of movement, but those sub- 
jects who cranked in the counterclockwise di- 
rection showed a significantly greater tend- 


ency to lag in rate. 


Received November 16, 1953. 
Early publication. 


References 


ntal design in psycho- 
York: Rinehart, 1950. 
to avoid the as- 
e analysis 


1. Edwards, A. L. Experime: 
logical research. New 


. Friedman, M. The use of ranks 
f normality implicit in th 


ry 


sumption 0: 
of variance. J. Amer, statist. ASS., 1937, 32, 
675-701. 

3. Helson, H. Design of equipment and optimal hu- 
man operation. ‘Amer. J. Psychol., 1949, 62, 
473-497 

4. Lincoln, R S. and Smith, K. U. Systematic 

f factors determining accuracy in 


analysis 0. 
visual tracking. Science, 1952, 116, 183-187. 


. Lincoln, R. S. and Smith, K. U. Visual tracking: 
Il. Effects of brightness and width of target. 
J. appl. Psychol., 1952, 36, 417-421. 

6. Wilcoxon, F. Some rapid approximate statistical 

procedures. American Cyanamid Co., 1949. 


wn 


Tue JOURNAL or APPLIED PsycHotocy 
Vol. 38, No. 3, 1954 


Applied Psychology in Action 


Reply to Dr. Wells and to Miss Epstein * 


Howard D. 


Hadley 


Morey, Humm, and Johnstone, Inc., New York, N. Y. 


Dr. Wells is correct in his analysis. I 
should have used credulity. 

T should also like to make some comments 
about Miss Epstein’s note. In therapy, you 
are concerned, at least at first, with reducing 
threats. In advertising, where you are deal- 
ing with more “normal” persons, the task at 
hand is to offer enhancements to the con- 


sumer. In this latter case, threat is some- 
thing to be avoided, not to be “cured.” In 
both inst: 


ances, a sympathetic at 
a primary requisite. 

with the atmosphere 
tisers and therapists, 


Miss Epstein is somewhat correct when she 
says that the non-interference principle is not 
* Wells, F. L. Comment 
appl, Psychol., 1954, 38 
note on “the non-directi 
appeals.” J. appl. Psych 


mosphere is 
I was mainly concerned 
created by both adver- 


on word meaning, J. 
» 133. Epstein, Mary. A 
ve approach in advertising 
ol., 1954, 38, 133-134, 


applicable to advertising. If there were no 
“interference,” there would be no a 
However, doesn’t a patient have an attitu E 
towards the therapist at the end of the z i 
sions? Also, aren’t these attitudes often P 
vorable? By not “interfering,” may not 4 
favorable attitude be developed? ea 

Actually, there is no pure example of 2a 
ferred advertising. Direct and inferred Ha 
contrasted to sharpen the concept. Just 2 
there are few (if any) completely introverte 
or extroverted persons, there are few ( if me 
advertisements which are completely direct 
inferred. by 

In the end, it is the atmosphere created i 
the advertisement that is important. aed 
inferred, directive-nondirective, are more log 
cal constructs than useful tools. It’s the €” 
result that is important, 


i 


Tue JOURNAL oi 


F APPLIED Ps 5 
Vol. 38, No. 3 1054 SYCHOLOGY 


as C. J. Berger, A. Chap 
W. Garner, W. 
Loucks, R. E, 

and M. J, Warwi 


for them to b 
standards but 
way of footno: 

The British 


s 
e acceptable to the trade 7 
some were adopted mainly 

tes and recommendations. sé 
Standards Institution subs 


% its starting point an Admiralty Repo 
(Naval Motion Study Report No. 48) whic? 
I had written summarizing all available 3 
search on the subject. This Committee Jy 
Pow been sitting for two years and not ° g 
has it con 


Applicd Psychology in Action 


search to be done to fill gaps in our knowl- 
edge. One such experiment is being carried 
Out at the moment on the relationship be- 
tween dial size and accuracy of reading. 


Tue Jours 
Vol. JURE OF, agren Psycno.oGy 


203 


This is, I believe, the first instance of psy- 


chological research being used as the basis of 
deliberations by the British Standards Insti- 


tution. 


Personnel Psychology and Small Business 


W. Grant Dahlstrom 


University of North Carolina 


smali cosy chologist acting as consultant to a 
tthe on een operates under serious restric- 
é selectio. usual personnel methods. Often 
iS te oth of employees is only a small part 
men nean services. The number of new 
May he į A hired is small, the turnover rate 
men on = nitesimal, and even a survey of the 
in compa ne job may yield paltry information 
Ideally PE a with usual employment studies. 
ried on = sort of operation: should be car- 
of the er national level with many or all 
al concerns of the same sort partici- 
the project. This would probably 
© teasible through arrangements be- 
Some aio of these concerns and 
: J chological consultants similar in 
no eT sychological Corporation. Lack- 
Still lea ee the psychologist locally 
. When Pen persistent difficulties. | 

time and 2 local concern invests considerable 
leve] man ae in job training a professional 
a consulta” contribution of the psychologi- 
at Valuable ae screening applicants could 
we Misa he method of choice in han- 
‘on 9 e problem would include the utiliza- 
tite rise tests. But the situation is 
the Way i i because of the difficulties in 
ese tests oe score standards on 
lenge ate i X n the opinion of the writer, 
jug, SUPPort = a of evidence which can 
Sments anda e consultant in making his 

in ee of thes recommendations. . 
the © tes ma = the degree of homogeneity 
Suey "Ployees ee by a survey of exist- 
ta on. ecently the writer obtained 
a firm of medical consultants 


Dating į 


providing a business advisory service to phy- 
sicians and dentists. There were only 10 field 
men and 2 central office men who had previ- 
ously acted as medical consultants themselves. 
The results on the Strong Vocational Interest 
Blank gave a compelling impression of homo- 
geneity. (In the group, 5 men had A ratings 
in both areas VIII and IX; 4 in IX; and 3 
others in VIII. The only other A ratings oc- 
curred in area III twice, area V twice, and 
areas VII and X once each. Very low rat- 
ings appeared in area I in more than half 
the group.) The MMPZ results, though less 
uniform, were also rather homogeneous. (The 
triad of scales Pd, Pa, Ma were at a codable 
level (above 54 T-score) in half the group, 
with Pa appearing at this level in 10 of the 
12 tests. The K scores ranged above 60 T- 
score without exception.) 

Obviously such a concept of homogeneity 
is relative. The men are very similar when 
you consider the assorted patterns from men- 
in-general. The uniformity is less impressive 
if reference is made to male college graduates 
only, or to business majors, or even more ap- 
propriately to those men with sufficient train- 
ing to be considered at all for employment in 
such a company. Nevertheless, this seems a 
workable concept. If sufficient data were 
available in a usable form on various general 
groups, the psychologist could make a judg- 
ment about the homogeneity resulting from 
selective survival. Not a great deal of the 
normative data on our multiscale tests is pub- 
lished in a form in which we can judge rela- 
tive frequency of particular score combina- 


204 


tions. A quantitative index of relative uni- 
formity could be devised which would serve 
better than judgment in this matter. 

This is not meant to imply that experience 
on the job, faking of test responses, and simi- 
Jar sources of variance may not also be op- 
erating to produce this uniformity, but as re- 
search information is accumulated on such 
factors as these, reasonable allowance could 
be made for them as well. This is one of the 
most workable meanings that can be offered 
for theory, as it is to be used in the areas of 
employee selection or vocational choice as 


discussed by Dr, Super in his divisional presi- 
dential address. 


available to the 
y this matter of 
m the test survey 
homogeneous, but 
ectations based on 
components, then 
ven firmer ground 


consultant. These 


marked interest in 
wit 


Up on the tests. This actually proved to be 
the common core of test results from the 
sroup. Workers more familiar with the tests 


Applied Psychology in Action 


and with a wider knowledge of job break- 
downs could erect a much more detailed set 
of expectations, Psychological consulta 
will always involve a modicum of this sort 0 
psychologizing, even in the face of greater 
usage of actuarial methods at specific de- 
cision points. Accumulated research findings 
should facilitate the formulation of thes? 
working hypotheses. 

The third line of evidence stems from the 
Correspondence between a man’s rated p me 
ficiency on the job and the degree of aP 
proximation in test score pattern to the an 
employee. Here restriction in range at scor 
dispersion is a serious limitation, and if i i 
Psychologist finds even a moderate cela 
ship, he may assume more satisfactory value 
would be obtained with a wider sampling. 7 

This last point is obviously the most decep 
tive line of evidence of the three since r 
eventualities as non-rectilinearity in the A 
relation surface, unexpected discontinuities a 
the functions, or errors in the validity 44 r 
may arise to embarrass him. These assump 
tions would have to be continually checken 
against research findings from psycholog’ n 
more favorably situated in respect to criterio 
data and research samples. sti- 

If the consultant takes the trouble to m 
tute a testing program and finds wa s 
homogeneity, consistency with expectation? 
and fair corroboration of test findings wi 
on-the-job performance, he can operate Mee 
considerably more confidence and ae 
ness on even small projects than he could e 
the basis of sheer intuitive speculation alo” 


j e assit 
ee — 
re 


xe —— 
i B ——— 
es G, 


7 = 


Book Reviews 


“age A. C., Pomeroy, W. B., Martin, C. 
£. Gebhard, P. H., et al. Sexual behavior 
in the human female. Philadelphia: W. B. 


Saunders Com ; 2 meee 
$8.00. pany, 1953. Pp. xxx + 842. 


Hiltner 
a S. Sex ethics and the Kinsey re- 
P rts. New York: Association Press, 1953. 
P. xi + 238. $3.00. 


ps Sophie D. and Corner, G.W. Twenty- 
National. of sex research. History of the 
a Research Council Committee for 
Phil EG in Problems of Sex, 1922-1947. 
adelphia: W. B. Saunders Company, 

Ana Pp. v+ 248. $4.00. 
ratio mtd, Psychology has grown in direct 
€ area o accumulation of facts. But in 
tion E sex behavior, answers to the ques- 
the fact h apologies to Dragnet) “What are 
y. n Ma’am?” have been hard to come 
heritage 's is due chiefly to our puritanical 
round fer has thrown such strong taboos 
Now with e subject. In a real sense, we are 
SS eames the final retreat of the censor 
Natura’ Pa man the right to knowledge of the 
egan ‘nee of the body. This retreat 
Pierce S years ago when Vesalius dared to 
Inside aT of the mystery of what goes on 
is disci e human body and, 100 years later, 
lation ple, William Harvey discovered circu- 

3 of the blood. 

cerning hesitates to comment in detail con- 
and his ¢ e epoch-making work of Kinsey 
0-workers, This is so because of the 
Eua i that have already appeared 
€ mal ‘on with his earlier publication on 
fema èh It is also because his book on the 
Publici as already received such widespread 
y newspapers, magazines, and on 
TV. One would only be repeating 
torg z many critical and uncritical evalua- 
"eason n teady written or said. For this 
p is review merely points to the facts 
‘ange ie been marshalled in the enormous 
Pacity, individual differences in reported ca- 
the £ Or rather claimed performance, and to 
A RAG detailed information set forth 
Male» entitled “Comparisons of Female 
” "The sex difference shown by the 


extens: 
s ensi 
e ve R 


fact that in 50 out of 33 items the male, on 
the average, is more readily affected by psy- 
chological stimuli is worthy of special note. 
This finding provides a wealth of insight for 
better understanding of the psychology of hu- 
man males and females. 

The applied psychologist would do well to 
follow closely discussions of Kinsey’s work by 
representatives of organized religion. Rever- 
end Seward Hiltner, who is pastoral consult- 
ant to the Editorial Advisory Board of Pas- 
toral Psychology Magazine, which was founded 
in 1950, has written a detailed interpretation 
of all aspects of Kinsey’s reports. His book 
will aid many clergymen to assimilate the 
findings with a minimum of trauma. It 
should enable them to do a better job of un- 
derstanding and counseling their parishoners, 
young or old. Enlightened premarital and 
marriage counseling as a pastoral duty has 
been going on to an ever-increasing extent 
for a generation. Hiltner’s book will un- 
doubtedly accelerate this important move- 
ment. 

The significance of Kinsey’s work and of 
Hiltner’s interpretation can be fully under- 
stood only by studying the magnificent 
achievements of the NRC Committee for Re- 
search in Problems of Sex. Aberle and Cor- 
ner’s report gives due credit to Robert M. 
Yerkes who was chairman from 1922 to 
1947. Yerkes’ foresight, initiative, tact, cour- 
age, and everlasting persistence were pri- 
marily responsible for this development. He 
was able to secure the collaboration of top- 
notch scientists. He was also able to secure 
continuity of financial support. The Com- 
mittee courageously supported research on all 


aspects of sex in all species from paramecium 


to man. Scores of researches were supported 
and hundreds of research reports were pub- 
lished in a wide range of scientific journals, 
monographs, and books. The bulk of the 
work was directed toward infrahumans but, 
from the beginning, research on sex behavior 
in man was strongly supported. The latter 
studies were begun by R. S. Lee, Adolph 
Meyer, L. M. Terman, and W. R. Miles in 
the nineteen twenties, were continued in the 


205 


206 


thirties by Carney Landis, E. Lowell Kelly, 
and Terman. Since 1937, Dr. Kinsey and his 
group at Indiana University have been the 
chief beneficiary. i : 

Thus psychology has moved from an intel- 
lectualistic preoccupation with man as a ra- 
tional being to a more realistic understand- 
ing of man as a behaving organism in all of 
his manifold adjustments. In short, sex can 
no longer be ignored. 


Donald G. Paterson 
University of Minnesota 


Lundin, R. W. An objective psychology of 
music. New York: Ronald Press, 1953. 
Pp. ix +303. $4.50. 


This book is a noteworthy addition to the 
psychology of music, especially for classroom 
use with the undergraduate student. Its style 
is clear and simple, its coverage is unusually 
comprehensive, and its range is wide. It will 
truly facilitate the learning process for the 
student, an advantage which has often been 

in this field. The psychology of 
standing of two very 
of them a science, 
The vocabulary and style 
rtist has often proved baf- 
st, and vice versa. Lundin 
al talent as an interpreter, 
material thoroughly clear 
asional oversimplifications 
le in terms of the student 


employed by the a 
fling to the scienti 
has shown a speci 
and has made his 
to both. His occ 
will prove justifiab 


of the greatest 
tion 


ing scales, and the imitators of th 
neer have done little to improv 
nal work. Lundin has throw 
into better perspective and his summaries and 
evaluations should save many future missteps 
in this special field of investigation, 

The most significant guideposts in this par- 
ticular area of musical research seem to point 
in the direction of cultural rather than 
nativistic explanations and interpretations. 


e great pio- 
e on his origi- 
n all these studies 


Book Reviews 


Lundin does well therefore in taking a siana 
for an interbehavioral point of view, mr 
steering his readers away from the ee 
stantial rhapsodies of Howe and the A 
more refined semantics of later writers, = 
ward the more substantial eS ta 
arguments, supported by something nend 
fine writing. This interbehavioristic hee 
which Lundin endorses so emphatically oe 
vades the thinking of many writers er 
though they identify themselves with aie th, 
acting and more eclectic schools. FarnsW "ith 
one of its strongest advocates, has Lage et 
point of view much more attractive and 33 A 
lating by demonstrating its usefulness i 
varied program of research. . wat 
From many quarters there are sigue ms 
renewed interest and activity in the proble "a 
of esthetics. Audiences and amateur p 
formers in both the graphic and theater a 
but especially in music, are in a penas has 
expansion. Now that the groundwork a 
been so carefully laid, any pedagogue o! zen 
perimentalist, and even Lundin himself, it- 
80 on to be more persuasive and more ne 
ful in developing the psychology of ven 
The whole field has been brought up to peer 
the arguments are sound, and the mover 
toward further knowledge has been ee 
accelerated by this timely and much nee 
book, si 
(Mrs.) Kate Heyner Muelle 


Indiana University 


The 
Woolf, M. D. and Woolf, Jeanne A. T 


ork? 
student personnel program. New writ 
McGraw-Hill Co., 1953, Pp. ix + 
$5.00. 


aj 
Subtitled, 7żs development and integrati? 
the high school and college, this book unr 

- attempt to picture a comprehen” 4 
student personnel program . . ” (p. V) fig 
draws heavily o 


in 
« 


: son" 

Virtually all Phases of the student perrort 
hel program are dealt with. After a S14 
introductor 


* T 
y chapter on the espandit rs 
of the personne] Worker, there are chê 


boa 


—— 


J 
! 


EEE 
ke 


Ea 


Book Reviews 


Beat dha group methods, student govern- 
oo ia ine; Rousig; remedial services, 
binine te ‘Grentation; faculty advising, 
tration ot ia personnel workers, and adminis- 
mpa ha program. Unfortunately, there 
grating a discernible framework, no inte- 
added -pi apay which might have given 
chapters = ing to the extensive content. The 
and the neon almost to be separate essays, 
ways one on the reader are not al- 
it ts dh rded. 

which on to specify a single group for 
fessional w k is entirely appropriate. Pro- 
elementary m m will find parts of it quite 
ë terea hough the many examples will 
ave the — Beginning students will not 
tion to vive cessary backgrounds of informa- 
Academic a y the critical reading it requires. 
bogged ie ministrators are likely to become 
into a cenit a which are not woven 
tis an ante E Een ; iz d 
Pethaps bec ecessarily difficult book to reac, 
ave been ause the authors appear not to 
Primarily i “i whether they were doing a 

aie rs olarly work or one based mostly 

wn experiences, comments In the 


Drefa 
Those sections re- 


Porti 


Pr: not well done. 
nowled is indicated, for example, by an 
' Pointi gment to another staff counselor 
Counselor S out that “. . . the refusal of the 
€ actua] m let the client lean on him may 
is E (cual (p. 33). The theories of 

line do ior discussed in the chapter on disci- 
Cholog hot represent the best of modern psy- 
en 2 Elsewhere, they say, without refer- 
m Pian “Tf on the American Cour 
L sco ë ation Psychological Examination, the 
iega percentile ranking is twice that 
ey, R Score . . . or vice versa, there is un- 
Sona io development and often a pet- 
= adjustment problem” (p. 227). 

the ¢ Chapter on counseling is uneven, and 
Dhasig Iago point-of-view is given such em- 
as to suggest a pre-eminence not 
The chapter 


ncil 


Bran 
n ted by most counselors. 


acu] k 
Practiog ty advising is excellent for its many 
Door Suggestions but is marred by & 
The 


ha 
nandled survey of the literature. 


207 


authors should be commended for their forth- 
right discussion of training requirements in 
which psychology is placed at the center of 
the program. 

Taken as a whole, the defects of this book 
seem to stem from two principal sources. In 
the first place, there is the previously men- 
tioned apparent confusion about whether this 
is a scholarly book or one primarily reporting 
on experiences. Both are valuable and neces- 
sary, but they need to be carefully amalga- 
mated, not cut and patched together. Sec- 
ondly’, there is the lack of a carefully thought 
out and explicitly stated philosophy of stu- 
dent personnel work. The tissues of a good 
book are presented, but there is no articulat- 


ing skeleton. 
John W. Gustad 


University Counseling Center, 
University of Maryland 
Leitner, K. Hypnotism for professionals. 

New York: Stravan Publishers, 1953. Pp. 

127. $4.00. 

Konradi Leitner was a stage hypnotist who 
became quite well known through his work 
with the USO during the war. In this post- 
humous book he describes his methods of 
working before an audience. He does this in 
a clear and interesting manner but he has 
contributed nothing to the scientific knowl- 
edge of hypnosis. There are a number of 
illustrations in the book using a very pretty 
feminine model with whom I’m sure any 


hypnotist would be happy to work. 
William T. Heron 


University of Minnesota 


es of success, ful 


School of Com- 
and 


E. Techniqu 
Madison: 
au of Business Research 
Wisconsin, Wiscon- 
Commerce Studies, Vol. I, No. 4, 

Pp. 41. $1.15. 

This is the report of a study undertaken 
for the purpose of gaining an understanding 
of the techniques oF traits characteristic of 
successful foremen prior to undertaking a su- 
pervisory training program. The “Jennings 
Supervisory Analysis,” a 23-item question- 
naire, Was administered to 1,682 workers and 


Jennings, E. 
foremanship. 
merce, Bure: 
Service, University of 
sin 
March, 1953. 


208 


their 52 foremen in a large midwestern plant. 
All workers filled out the questionnaire by 
checking items’ which “outstandingly” de- 
scribed their own foremen. Every third 
worker filled out a second questionnaire, 
checking the 3 items he considered most de- 
sirable in foremen. The 52 foremen filled 
out 2 questionnaires. On the first, they 
checked items which best described their own 
behavior; on the second, they checked the 3 
items they considered most desirable in fore- 
men. Foremen were rated for over-all ability 
by pooling ratings of their immediate su- 
periors with those made by top management 
in the plant. An appendix describes the 
method used in obtaining and pooling these 
ratings. 

Findings presented are relative to the 23 
items in the questionnaire. This, unfortu- 
nately, places a limitation on their meaning 
because all 23 items are favorable character- 
istics, selected when the questionnaire was 
developed on the basis of being “both gen- 
erally descriptive and desirable” of foremen. 
With this limitation, findings include the fol- 
lowing: (1) the 3 most desirable techniques 
of foremen are be fair to everyone, go to bat 
for workers, and give clear-cut instructions; 
(2) there is little relationship between what 
workers feel is desirable in foremen and their 
descriptions of their own foremen; (3) there 
is little relationship between what workers 
and foremen feel is descriptive of foremen; 
(4) foremen and workers largely agree on 
what the desirable characteristics of a fore- 
man are; (5) traits or characteristics descrip- 
tive of foremen rated as Successful by their 
superiors are also considered desirable by 
workers. 

There is no indication in the report of the 
audience for which it js intended. It ap- 
pears, however, that it was not intended for 
persons with technical background 
much of what should be included in 
for this group is lacking. Only gross rank- 
ings are presented, without averages or meas- 
ures of variability. Correlation coefficients 

are used without descriptions of the method 
of computation used. Interpretations of sta- 


tistical findings are also questionable in some 
cases. For example 


in that 
a report 


, in an item analysis in 


Book Reviews 


which frequency with which a foreman Was 
described by an item was correlated with sug 
cess as indicated by superiors’ ratings, 12 co 
relations, ranging from .28 to .53, are pte 
sented as evidence that these 12 items ae 
“highly related” to success. The Siar, 
sections of the report go beyond the dea 
presented, although the author does point Bs: 
that intensive interviewing of foremen W 
an additional source of information. MS 
Although this report presents objective € it 
dence on desirable foreman characteristics, © 
is doubtful whether the author’s hope na 
the findings “. . . can be used to both ee 
the objectives and to increase the effet 
ness of foreman training programs” will a 
realized by persons who turn to this repo 
with that same hope in mind. 
Theodore R. Lindbom 
Midland Cooperatives, Inc., 
Minneapolis, Minnesota 


sitin aD, 

Montagu, A. The natural superiority a 
women. New York: Macmillan, 1953. 

205. $3.50. 


The first question that confronts the is 
viewer in evaluating this book is “Just W re 
was it written?” The facts brought out Ba 
about sex differences have been available spe 
some time to intelligent men and women om 
the readers of the Saturday Review for u 
this presentation was first designed. aie 
books of Amram Scheinfeld and Marg” id: 
Mead on the subject have been widely a a 

It seems that this particular work tio? 
polemic rather than simply a populariz@ ike 


em 
om 


s P ci 
aggressiveness and ng- 


c ilure to promote Jov" 
kindness and cooperation. These polic!® 3 ed 
Sees as a consequence of the long-conth a 
en. The psycholog on 
ey excel have 


. yen 
Systematically devaluated and thus both ? 


ji ues 
and women have failed to stress the ye 
which alone can insure the survival ° the 
manity. If wi 


€ can become convinced tha! 


iy 
biologically and psychologic 


female sex js 


| 
| 
| 


Book Reviews 


Superior, that endurance and resistance to dis- 
€ase are more important than muscular size 
and strength, and that emotional expressive- 
ness and social perceptiveness are more im- 
ee than aggressiveness and mechanical 
ate we shall have taken the first step to- 
ard the new emphasis our times require. 
M e necessity the author sees to make the 
te at women are in all ways superior 1S 
Sponsible for the book’s major defects. In 
ire place, he insists again and again that 
the n e is about to say will be shocking to 
oi One reads on to encounter some 
in s t innocuous fact such as the difference 
ie rates or frequency of automobile 
in ‘ents. Secondly, his argument often leads 
Kiinan reasoning which has sometimes been 
Ta a labeled “feminine” logic. In Chap- 
is no or example, he explains first that there 
-intelli relationship between brain weight and 
Wom, "Bence and then goes on to argue that 
in en’s brains are actually larger than men S$ 
wep ortion to total body size. If brain 
nE SN 1s a matter of no importance, why in- 
ie a. Superiority with regard to it? Thirdly, 
istort ating orientation produces @ certain 
n EN in some of the facts themselves. 
exam i topic of intelligence differences, for 
isting le, he devotes so much more space to 
Verbal all the kinds of evidence for superior 
ae ability in girls than he does to sum- 
izing the kinds of test material on which 


209 


boys excel, that his conclusion that girls do 
better than boys on intelligence tests, with a 
few insignificant exceptions, appears plausible. 
On page 121 he quotes Stoddard’s statement 
as to the impossibility of evaluating sex dif- 
ferences in intelligence using our present tests, 
but by this time he has already used the 
available data to support his argument for 
female superiority. (Incidentally this is one 
of the facts he expects to be “shocking” to 
us.) 

Few of us would quarrel with Mr. Montagu’s 
desire to see more love and cooperation in our 
society or his conviction that good relation- 
ships between the sexes are vital. The ques- 
tion is how much such an approach as this 
contributes to these ends. On page 185 he 
asks, “Is it too much to hope that the claims 
herein made for the natural superiority of 
women will shake men out of their com- 
cceptance of the present position of 
the sexes?” My answer would be, “Yes, I 
am afraid it is too much to hope. People’s 
convictions are not that easily shaken.” But 
whatever it may be worth as argument, for 
serious students of differential psychology, 
the contribution made by this book to our 
factual knowledge of sex differences can 
safely be ignored. There is nothing new here. 
k Leona E. Tyler 


placent a 


University of Oregon 


New Books, Monographs, and Pamphlets 


i aterson, 
Books, monographs, and pamphlets for listing and possible review should be sent to Donald G. Pal 
j Editor, Department of Psychology, University 


Twenty-five years of sex research history of 
the National Research Council Committee 
for Research in Problems of Sex, 1922- 
1947. Sophie D. Aberle and George W. 
Corner. Philadelphia: W. B. Saunders 
Company, 1953. Pp. 248. $4.00. 

The nature of prejudice. Gordon W. Allport. 
Cambridge, Mass.: Addison-Wesley Pub- 
lishing Company, Inc., 1954. Pp. 544. 
$5.50. 

Student personnel services in higher educa- 
tion. Dugald S. Arbuckle. New York: 
McGraw-Hill Book Company, 1953. Pp. 
352. $4.75. 

The human person. Magda B. Arnold, J. A. 
Gasson, e¢ al. New York: The Ronald 
Press Company, 1954. Pp. 585, 

Educational psychology. Glenn Myers Blair, 
R. Stewart Jones, and Ray H. Simpson. 
New York: The Macmillan Company, 1954. 
Pp. 601. $4.75. 

Introduction to advertising. Arthur J. Brew- 
ster, H. H. Palmer, and Robert G. Ingra- 
ham. New York: McGraw-Hill Book 
Company, 1954, Pp. 480. $5.50, 

Handbook of probability and statistics with 
tables. Richard S. Burington and Donald 
C. May, Jr. Sandusky, Ohio: Handbook 
Publishers, Inc., 1953. Pp. 340. $4.50. 

Practical applications of democratic adminis- 
tration. Clyde M. Campbell. New York: 
Harper & Brothers, 1952, Pp. 325. $3.00. 

Studies in the scope and method of “The Au- 
thoritarian Personality.” Richard Christie 
and Marie Jahoda, Editors, Glencoe, IL: 
The Free Press, 1954, Pp. 279. $4.50, 

Journal of personnel administration and in- 
dustrial relations. Edited by Lee W, 
Cozan. Vol. 1, No. 1, January 1954, 
Quarterly. $6.00 per year. 

Rehabilitation of the older worker. Wilma 
Donahue, James Rae, Jr, and Roger B. 
Berry. Ann Arbor: University of Michi- 
gan Press, 1953. Pp. 200. $3.25. 


Building up the supervisor's job. M. J. 
Dooher, Editor. New York: American 
Management Association, 1953. Pp. 35. 

210 


of Minnesota, Minneapolis 14, Minnesota. 


Essentials of effective administration. M. J 
Dooher, Editor. New York: a 
Management Association, 1953. Pp. > f 

Motivation: the core of management. M. z 
Dooher, Editor. New York: anal ‘ 
Management Association, 1953. Pp. 4 R 

The historical roots of learning mer 
Horace B. English. New York: Dou a 
day and Company, Inc., 1954. Pp- F 
$.65. i 

How to choose that career. S. Norman i 
gold. Cambridge, Mass.: Bellman Publi 
ing Company, 1954. Pp. 52. $1.00. reste 

Child development. Tlse Forest. New Wo. j 
McGraw-Hill Book Company, 1954: 

286. $4.00. 

The rating of performance with : 
films. Paul F. Fornallaz, Madison: rce, 
versity of Wisconsin, School of Comme, 
Bureau of Business Research and Serv! 
1954. Pp. 35. $1.15. k. 

Feelings oad Ll a Lawrence K. Frank 


ip of 
the he fe 


New York: Doubleday and Company: = 
1954. Pp. 38, $.85. New 
Elements of statistics. H: Q; Fryst Pp- 
York: John Wiley & Sons, Inc., 1954- 
263. $4.75. ‘ cond 
Psychology applied to human affairs. Seci 
Edition. J. Stanley Gray. McGrawie], 
Book Company, Inc., 1954. Pp- 
$6.00. 


a 
Measurement and evaluation in the secondi 
school. Second Edition. Harry A. e 
Albert N. Jorgensen, and J. Ray? 
Gerberich. New York: Longmans, 
and Co., Inc., 1954, Pp, 690. $5.00 


= 
f ntar). 
Teaching success of Catholic eleme! gs. 


SS 

School teachers, Sister M. Mynette jve” 

Washington, D, C.: The Catholic 

sity of America Press, 1953. Pp- 17 ment: 
Some observations on executive retiree, ysi- 

Harold R. Hall. Boston: Harvard "35. 

ness School, Division of Research, 

Pp. 298. $3.75 put | 
How to lie with statistics. Darrell pan: 

New York: W., W, Norton & CO" 


Inc., 1954. Pp, 142, $2.95. 


New Books, Monographs, and Pamphlets 


| = ces of psychotherapy. Harrington 
3 oe and Lenore R. Love. New 
> McGraw-Hill Book Company, 1954. 
' ; P. 270. $5.00. 

Sil supervisory behavior. Eugene E. 
cit Madison: University of Wis- 
Bu School of Commerce, Bureau of 

siness Research and Service, 1954. Pp. 


i. $1.15. 

oee Rating Series. Joseph E. King 
| vi Paye W. Wingert. Chicago: Indus- 
$3.6 Psychology, Inc., 1953. Pp. 52. 
Der 
a lopment in the Rorschach technique. 
Klonie I: Technique and theory. Bruno 
Miter Mary D. Ainsworth, Walter G. 
| Hud. er, and Robert R. Holt. Yonkers-on- 
Pie N. Y.: World Book Company, 

Stati i 1954. Pp. 726. 
ot ical methods in experimentation: an 
Th ‘oduction. Oliver L. Lacey. New York: 
54 oe Company, 1953. Pp. 249. 


Be 
= nage communications. Spencer A. 
nity Pr Editor. Detroit: Wayne Univer- 
he na A 1952. Pp. 282. $1.75. 
York e man. Clarence Leuba. 
70 aed and Company, Inc., 1954. 
Mea - $95. 
ng group cohesiveness. Lester M. 


i. New 
= Ann Arbor: University of Michigan 
E ror Pp. 111. $2.00. 
Gardner of social psychology. Volume I. 
Mass, er Lindzey, Editor. Cambridge, 
Pany, a A Publishing Com- 
nc., 1954. 
h 


Han db book 
Gar diner of social psychology. 


pass.: 


Pp. 704. $8.50. 
Volume II. 
Lindzey, Editor. Cambridge, 
pany, oo -Wesley Publishing Com- 

eat nc., 1954. Pp. 704. $8.50. 
ty Loyd © the sub-normal child. Frances 
Sa York: Philosophical Library, 
Pp. 148. $3.75. 
counseling in Japan. Wesley P- 
Minneapolis: University of Minne- 


Pp. 204. $4.00. 
L. Marcuse, Editor. 
1954. Pp. 


Suan bine: 
soya 


i a Press, 1953. 


New 129 chology. F. 
| 532 ork: Harper & Brothers, 1 
* $5.00, 


211 


Understanding the Japanese mind. James 
Clark Moloney. New York: Philosophical 
Library, 1954. Pp. 252. $3.50. 


Revised Minnesota Occupational Rating 
Scales. Donald G. Paterson, C. D’A. 
Gerken, and Milton E. Hahn. Minne- 


apolis: University ‘of Minnesota Press. 
1953. Pp. 85. $2.00. : 

An introduction to clinical psychology. Sec- 
ond Edition. L. A. Pennington and Irwin 
A. Berg, Editors. New York: The Ronald 
Press Company, 1954. 

Counseling: theory and practice. Harold B. 
Pepinsky and Pauline N. Pepinsky. New 
York; The Ronald Press Company, 1954. 
Pp. 328. $4.50. 

Music therapy. Edward Podolsky. 
York: Philosophical Library, 1954. 
335. $6.00. 

Mid-century crime in our culture. Austin L. 
Porterfield and Robert H. Talbert. Fort 
Worth: Leo Potishman Foundation, Texas 
Christian University, 1954. Pp. 113. $2.25. 

The personnel administrator at the crossroads. 
John Post. New York: American Manage- 
ment Association, 1953. Pp. 54. $1.25. 

Introduction to educational psychology. H. 
H. Remmers, Einar R. Ryden, and Clellen 
L. Morgan. New York: Harper & Broth- 
ers, 1954. Pp. 420. $4.00. 

‘Introduction to opinion and attitude meas- 
urement. H. Remmers. New York: 
Harper & Brothers, 1 1954. Pp. 437. $5.00. 

The high school student. John W. M. Roth- 
ney. New York: The Dryden Press, 1953. 


Pp. 271. $1.90. 
Personality dynamics. 


New 
Pp. 


Bert R. Sappenfield. 
Knopf, Inc., 1954. 


New York: Alfred A. 
Pp. 412. $5.50. , 
Psychological problems in mental deficiency. 
Seymour B. Sarason. 


Second Edition. 
New York: Harper 
402. $5.00. 


The clinical interaction: 


ence to the Rorschach. 
Harper & Brothers, 1954. 


& Brothers, 1953. Pý. 


with special refer- 
Seymour B. Sara- 


son. New York: 
Pp. 369. $5.00. 

Personnel management. Walter Dill Scott, 
Robert C. Clothier, and William R. 
Spriegel. New York: McGraw -Hill Book 


Company, 1954. Pp. 690. $6.50. 


212 


Personal adjustment in the American culture. 
Franklin J. Shaw and Robert S. Ort. New 
York: Harper & Brothers, 1953. Pp. 388. 
$4.00. 

Groups in harmony and tension. Musafer 
Sherif and Carolyn W. Sherif. New York: 
Harper & Brothers, 1953. Pp. 316. $3.50. 

Man in society. George Simpson. New 
York: Doubleday and Company, Inc., 
1954. Pp. 90. $.95. 

Occupational books: an annotated bibliog- 
raphy. Sarah Splaver. Washington: Biblio 
Press, 1952. Pp. 135. $4.00. 

Annual review of psychology. Volume 5. 

¥ C. P. Stone, Editor. Stanford, Calif.: An- 
nual Reviews, Inc., 1954. Pp. 455. $7.00. 

Contemporary theories of learning. Louis P. 
Thorpe and Allen M. Schmuller. 
York: The Ronald Pres 
Pp. 450. 


New 
s Company, 1954, 


New Books, Monographs, and Pamphlets 


Manual of psychological medicine. Third 
Edition. A. F. Tredgold and R. F. Tred- 
gold. Baltimore: The Williams & Wilkins 
Co., 1953. Pp. 328. $7.00. 

The juvenile offender. Clyde B. Vedder. - 
New York: Doubleday and Company, Inc. 
1954. Pp. 510. $6.00. 

The Soe ra situation. W. Edgat 
Vinacke. Honolulu 14, Hawaii: University 
of Hawaii, 1954. Pp. 32. $.65 plus pos 
age. 

Statistical methods in educational and ye 
chological research. James E. Wert, Chat a 
O. Neidt, and J. Stanley Ahmann. Ne 
York: Appleton-Century-Crofts, Inc., 199% i 
$5.00. b 

Free and unequal: the biological basis of H 
dividual liberty. Roger J. Williams. Sy 
tin: University of Texas Press, 1954. PP 
177. $3.50. 


pa 


noanamae C 


Journal of Applied Psychology 


VoL. 38, No. 4 


AUGUST, 1954 


Vor, 38, Noa OOOO O OO 


Social Status of Industries 


Arthur H. Brayfield 


Carroll E. 


William 
Chesapeake and Ohio 


In 
tions ean demonstrated that occupa- 
tige (3) y 2 arranged in order of social pres- 
ated with tages prestige is usually associ- 
hess occu e professional and “higher” busi- 
ee ngs Skilled trades, technical, 
mediate utive occupations occupy an inter- 
“A a aoa led the semiskilled 
om of the re occupations ranked at the bot- 
tatus of 5 ierarchy. Research on the social 
TE e ene has continued and the 
Ished, Th arrangement has been well estab- 
ts ala study was repeated with 
S en 21 years later by Deeg and 
sists found almost no change in the 
(4), rankings during the intervening 


S 


minor y 
aterso 
Socia] 


; Oe; r 
iidse aal status hierarchy exist among 
Vestigati It occurred to the writers that an 
isting cad a of this question might be inter- 
Ure Me ne A review of the litera- 
We report ed no such studies. In this paper 
“ate a T exploratory attempt to ascer- 
ined ie ether or not an industrial hier- 
i ccupatio, and (2) the possible influence 

“Ntificati onal status stereotypes upon the 

ion of such a hierarchy. 


The 4 Method 

x Tanking method for this investigation was 
aries of procedure similar to that of the 
Close] Occupational prestige hierarchies 
erson a patterned after Baudler and 
s ele An alphabetical list of 29 in- 

an e agers to 68 men and 
ers of the same class in Gen- 


+ Se eee 


Kennedy, Jr. 


Kansas State College 


and 


E. Kendall 
Railway, Cleveland, Ohio 


eral Psychology with instructions to “rank 
according to what you think their social 
standing is in your community or state.” At 
least one industry from each of the 9 major 
divisions in the Standard Industrial Classifi- 
cation Manual (5) was included. Competi- 
tive industries were included in a few in- 
le, bus companies, air 


stances as, for examp 
transport, railroads, and trucking companies. 


The respondents were predominantly college 
freshmen and sophomores representing 26 dif- 
ferent curriculums. The median rank and its 
quartile deviation were computed for each in- 
dustry and the industries were then placed in 
rank order according to their median values. 
The rank order correlation (7/0) between 
men and women rankings was computed. 

A subsidiary problem was to attempt to dis- 
cover whether or not respondents were influ- 
enced by the social status of a particular oc- 
cupational level stereotype which might be 
associated with any given industry. The 
method employed was to vary the instruc- 
tions to four additional groups of respondents 
who ranked the same list of industries. 
total of 48 men and 76 women from classes 
in General, Educational, and Social Psychol- 
ogy responded to instructions to rank the 29 
industries “acco ding to what you think the 
social standing of an executive in each of the 
industries is in your community or state.” 

An additional men from 


48 men and 66 wo 
General and Educational Psychology ranked 
the industries under i 


nstructions to “rank ac- 
cording to what you think the social stand- 


ere 


21pm 
i cy. R@Bearch | 


reau Ednl. 78¥. 
4LAING COLLEGE | 


i u 
Vi) DA cw 4 


Arthur H. Brayfield, Carroll E. Kennedy, Jr., and William E. Kendall 


Table 1 


Rank Order of 29 Industries Based on Median Social Status Rankings by 68 Men 


and 


52 Women College Students * 


Men 


Women 
i Median : tile 
ow Median Quartile Rank Mean iion 

Industry Order ` Ranking Deviation Order Ranking = 
- 5 
Medical services 1 2.1 19 1 ar 13 
Banks 2 2.7 1.3 3 a 14 
Education 3 48 2.7 3 5 3 28 
Federal government 4 5.4 3.9 4 4 6A 
Farming 5 8.5 8.7 5 7.1 53 
Local government 6 10.8 6.7 6 8.5 53 
Aircraft manufacturing 7 11.5 47 16 14.8 37 
Broadcasting companies 8.5 12.5 6.3 7 9.5 49 
Real estate companies 8.5 12.5 5.5 8 10.8 53 
Air transport companies 10.5 13.2 3.8 11.5 13.8 3.9 
Electric light companies 10.5 13.2 48 11.5 13.8 5.6 
Automobile manufacturing companies 12 135, 57 20 17:5 6.6 
General building construction 13.5 14.0 48 13 14.2 31 
Telephone companies 13.5 14.0 5.0 14 14.3 6.6 
Chemical manufacturing companies 15 14.3 4.9 15 14.5 54 
Machinery manufacturing companies 16 14.5 5.4 21 18.5 3.9 
Food manufacturing companies 17 15.3 4.9 18 16.5 5.0 
Publishing companies 18 15.8 6.2 9 12.5 58 
Motion picture companies 19 16.3 7.9 10 13.5 39 
Railroads 20 16.7 68 18 16.5 51 
Retail drug companies 21 18.5 4.6 18 16.5 4 1 
Furniture manufacturing companies 22 18.7 4.2 24 20.4 38 
Wholesale drug companies 23 19.3 3.6 23 19.8 5.0 
Hotels 24 21.0 49 25 215 5.8 
Oil drilling companies 25 21.5 6.8 26 22.0 s f 
us companies 26 22.0 4.8 22 19.5 a 
Trucking companies 7 23.7 3.9 i 260 z 
epee! % 702g ae 

Coal mining companies 29 27.0 29, 29 28.2 : 

* Median rankings and quartile deviations Teported to one decim 


ing of a laborer in each of th 
your community or state,” 
The results for the latt 
treated statistically as 
method groups and inte 
the three methods were 


e industries is in 


er four groups were 
for the two base 
rcorrelations among 
computed by sex, 


Results 


The results of the rankings by the base 
method groups are shown in Table 1. The 
median rankings of industries distribute them- 
selves over a wide range (from 2 to 27) 
whereas chance Fesponses would yield a 


aa 


š k 0! 
al place only although median rank 
al places, 


clusterin 
It is evi 
deviations that t 
ment on the indy 
and low than on 
of the distributio 


8 around the median value of 


stries ranked extremely , 
those ranked in the 


5 = ua 

dent from inspection of the T gree 
. oo. 

here is much greater igh 


adle 


ders 


145 


rtile 


n, for ° 

he correlational results by sex ized 
three ranking methods are summari” cant 
Table 2. he correlations are all oe On 
beyond the 1% leye] The influence | cot 
cupational Stereotype is small since t 44 
relations are of substantial magnitude- pelt 


Men and women agreed markedly i? 


—— 


lira 
— 
EEE ee 


Social Status 


a rankings irrespective of method. For 
as se method the rho was .90, for “Execu- 
Th 90, and for “Laborer,” .93. 
eco of an industrial status hier- 
ihe te ms to be well, established by the re- 
lie et the administration of the three 
hk os e assignment of ranks is obviously 
ce phenomenon and is relatively 
ced by sex. 


Table 2 


Inte F 

ntercorrelations (rho) between Three Methods of 
Ranking 29 Industries on Social 

— Status, by Sex 

correlations upper right-hand half of the table the inter- 

the lower tee the rankings by men are given and in 

tions for th -hand half of the table the intercorrela- 

rat he rankings by women are given. 


Method 
B Base “Executive” “Laborer” 
ase 
po ; — .89 89 
Pak al 78 = 81 
-aborer”? f s ; 
— 92 84 == 


eae inven of such a prestige hier- 
ank has obscure. For example, the high 
this Pm nig to farming by all groups M 
We atte y may reflect a geographical factor. 
Possible eet to ascertain the influence of a 
e a pation level stereotype upon 
e latae but found little influence within 
OPperatio ations of the method used. Since the 
ereot: n of at least a white collar-blue collar 
of the se seems probable from an inspection 
f or ankings a more intensive study of this 
ee well be undertaken. o 

d hae’ interesting differences within a 
Urnitu, ustrial classification. For example, 
than 2 e manufacturing did not rank higher 
anuf. r on any of the lists while aircraft 
of p 2cturing ranked below 8th on only one 
Portat; six rankings. In the field of trans- 
anies on, bus companies and trucking com- 
Ottom consistently ranked well toward the 
“njoyeq while other forms of transportation 
a considerably higher status. On the 


Toa 


of Industries 215 
other hand, there was no reliable differentia- 
tion between electric light and telephone com- 
panies selected as representative of utilities, 

The findings of this study should be of in 
terest to several groups. A few industries 
have demonstrated their concern for public 
opinion by conducting confidential surveys of 
the public’s attitude toward them. The so- 
called institutional advertising campaigns are 
further evidence of this concern. Personnel 
workers are aware of the influence of public 
opinion on their recruiting programs (2, p. 
88). Further, the prestige associated with an 
industry may be a factor in job satisfaction. 

Vocational counselors should be alert to the 
possible influence of the industrial status hier- 
archy on the vocational plans of their coun- 
selees. The methodology employed is poten- 
tially useful to placement officers in schools, 
colleges, and public and private employment 


offices. 
Summary 


The existence of a prestige hierarchy among 
industries was established through the use of 
a ranking method employed with college un- 
dergraduates representative of a variety of 
curriculums. The influence of occupational 
level stereotypes was studied and found to be 
negligible for the populations studied and the 


method used. 


Received August 27, 1953. 


References 


aterson, D. G. Social status 


1. Baudler, Lucille and P. 
Occupations, 1948, 


of women’s occupations. 
26, 421-424. 

2. Bellows, R. M. 

ness and industry- 

1949. 

3. Counts, G. S. Th 
a problem in vo 
1925, 33, 16-27. 

4. Deeg, Maethel E. and Paterson, D. 
in social status of occupations. 
1947, 25, 205-208. 

5. Standard industrial classification manual. Manu- 
facturing industries (Vol. 1) and Non-manu- 
acturing industries (Vol. II). Washington: 
U.S: Government Printing Office, 1942. 


of personnel in busi- 


Psychology 
Prentice Hall, 


New York: 
cupations: 


e social status of oc 
Sch. Rev., 


cational guidance. 


G. Changes 
Occupations, 


THE JOURNAL or APPLIED Psycuozocy 
Vol. 38, No. 4, 1954 


Manager-Employee “Understanding” in the Retail Grocery 
and Meat Market 


Pietro V. Marchetti 


University of Illinois 


We have chosen, somewhat arbitrarily, the 
term “understanding” as a label for the trait 
or ability of interest to us in this study. This 
term has been used previously by others as 
we are using it. Still other investigators have 
used the words empathy, social psychological 
empathy, and social perception. The ability 
we are interested in is that of being able to 
place one’s self in the position of another, 
We have taken as an indicant of it simply 
the accuracy with which one person is able to 
predict the responses that another will make 
to some given stimulus situation. The now 
very popular technique, and the one we have 
employed, is to have one person predict the 
responses of another 
device, 


employed. 


In much common sense s 
determiners of eff 


p Was an inadequate o 
i [Leadership] appears rather to þe a wo 
ing relationship among members of a group 
in which the leader acquires status through 
active participation and demonstration of 
his capacity for carrying cooperative tasks 
through to completion. Significant aspects of 


this capacity appear to be intelligence, alert- 
ness to the needs and 


insight into Situations, | | 


ne. 
rk- 


Gibb has prepared a survey of those lean 
ship studies emphasizing the interactiona fe 
lationship between the leader’s traits and in 
characteristics of the particular situation a 
which he functions. Gibb writes, “The g 
tion of the leader is to embody and give © 
Pression to the needs and wishes of the oe a 
and to contribute positively to the sat 
tion of these needs” (6, p. 20 f.). Roet ny 
berger (13) and Barnard (1), among hire 
others who have written in the area of m as 
trial leadership, point to such an ability 
an important one in effective leadership- -d 
might note parenthetically that Barnat in 
Professional industrial manager, nee 
1940, “Leadership appears to be a func in- 
of at least three complex variables—the ns” 
dividual, the group followers, the conditio in- 
(1, p. 16). From his observations in the en 
dustrial enterprise he arrived at a siae ee 
about leadership quite in accord with the Piip 
chologist’s interactional theories of leaders 
Which have replaced earlier trait theonss d 
One psychological study we would ” 
briefly, which served as a major impetus e 
our own work, is that of Chowdhry (4): om- 
has suggested situational-traits—traits $ t 
mon to leadership and yet a function ©" he 
situation. She found, in general, that ie 
sociometric leader in the groups she stu d 
(primarily college student groups) A 
make more accurate judgments or estr 
of group opinion than could the non-lea® 
defined sociometrically, nto 
The studies of Meyer (12) and of Caia 
(3) are two Studies, from the pertinent A 
ture, very closely related to our own- Jarg? 
studied 200 first-line supervisors in 4 
utility company. He asked the supe" wh? 
to predict the behavior of other persons, cer 
had been described briefly for them, 1 evi 
tain interpersona] situations. There warded 
dence that the better supervisors ree? ings 
others as individuals with motives, f¢¢ 


a 


visor? 


216 


Manager-Employee “Understanding” in the Retail Grocery 


and goals of their own. The poorer leader 

| was more likely to perceive others in relation 
i mo his own motives or goals. Cantor did an 
ens study of a human relations pro- 
pani m the Farm Bureau Insurance Com- 
follow in Ohio. One of his findings was that 
the $ ing the supervisory training conferences 
ae HPs Nail gains in scores on a 
—_ the ability to estimate group opinion. 
Fe in this sampling of earlier studies 
lead relate to the suggested factor of the 
aa se understanding of the followers there 
oct mers, of Michigan Survey Re- 
vision enter studies of productivity, super- 
aie and morale. One of these was carried 
ther. a large life insurance firm (7) and the 
Hong wth gengs of men who maintain sec- 
| Otth of railroad right of way (8). In each 
p ese studies there was evidence that em- 
pae groups of higher productivity were un- 
em es or foremen who were more 
t oon centered. There was evidence of 
fae than taking more interest in their employees 
unit did the leaders of lower productivity 
ae There was some evidence that they 
eit ered the possible needs and motives of 
daa ee in their interpretations of the 
e Aa of employees. This last finding may 
Wass mpared with the results of a study of 
Youth (11). He reports that the leaders of 
cies groups sponsored by community agen- 
or a taking courses intended to make 
mor ore effective leadership of youths, made 
€ of what he calls causal reactions rather 


t $ : 
oe Judgmental reactions to the behavior of 
youths. ple of a C- 


reaction, « He gives as an exam he 
Cause he “Joe is smoking a pipe perhaps e- 
Perha e is the smallest boy in the group °" 
again to rebel against paternal sanctions 
a ba St smoking”; and as a J-reaction, Joe is 
boy,” or “Joe’s only fault is smoking. 
e tanting as a factor in effective leadership, 
Cader’s ability to understand the group 
abilit ers, a number of questions about Pe 
evin, ickly arise. One is suggested iy o 
ay be (9) discussion of leadership. There 
a curvilinear relationship between U 


derstang; 1 
Standing and leadership effectiveness- With 
have Le- 


O lim: 
gence understanding we May 
cessu or malistic leader who is not very af 

elicit 1n motivating the group members an 
ng from them a genuine contribution of 


217 


their efforts to the group’s tasks. The op- 
posite extreme of understanding may make 
for Levine’s anarchic leader acutely sensitive 
to the feelings of the group members but in- 
capacitated by his lack of ability to abstract, 
to see beyond the concrete. Apart from the 
question of optimal amount of understanding 
there is also the question about the kind of 
understanding. That is to say, there are 
many different things that one might know 
about another person—many different aspects 
of another’s personality that one might un- 
derstand or know about. It is reasonable to 
assume that the various possible kinds of un- 
derstanding that one might have of another 
are not equally important in the leader-fol- 
lower relationship in the job situation. Perti- 
nent to this question is work of Luszki (10) 
on empathic ability and social perception. 
She presents evidence of some independence 
between the ability of one person, A, to pre- 
dict the responses of another, B, to a stimu- 
lus situation not involving A, and A’s ability 
to predict the responses of B to a stimulus 
situation which does involve A. She speaks 
of detached as compared with participant ob- 
We shall borrow these terms 
analogously as adjectives for 
understanding. A third question we ask is 
that of differences among work situations in 
terms of the degree to which effective leader- 
ship in the situation is determined by or as- 
sociated with the Jeader’s understanding of 
the group members. Where the group tasks 
are such that individuals function more as 
automatons the leader’s understanding of the 
group members may be of less importance m 
leadership effectiveness, particularly so if we 
take some aspect of group productivity as a 
criterion of leadership effectiveness. There is 
a fourth and final question we would raise at 
this point. This has to do with the relation 
of “apparent” to “real” understanding. The 
latter is the sort of understanding with which 
we are concerned in the present study. This 
is related to the amount and kind of knowl- 
edge that one person has of another which 
makes it possible for him to make an accurate 
prediction of how the other person will re- 
spond to a given situation. The person A 
may have such understanding of B. B, how- 
ever, may not have such understanding of C 


server skill. 
and use them 


218 


and yet he may appear to. A, with the 
knowledge or understanding that he has of B 
can select, with a minimum of trial and error, 
the appropriate stimulus situation with which 
to confront B in order to elicit a given kind 
of response from B. B on the other hand 
does not have such understanding of C. 
Nevertheless, he is able to elicit, from C, a 
desired response. B’s success in this may be 
primarily a matter of trial and error. He 
may be able to confront C with one stimulus 
situation after another noting very quickly 
any immediate cues that C may be giving 
him, on the basis of which B can predict 
what C’s more complete response to the 
stimulus situation would be. On this basis 
B can decide if it will be necessary to pre- 
sent C with still some other stimulus situa- 
tion in order to elicit the desired response or 
not. Another recognition of this problem is 
to be found in a discussion of leadership by 
Smith (15). 

The task we have set for ourselves is that 
of making a more frontal attack upon the 
problem of leader-follower (and follower- 
leader) understanding in the job situation. 
We hope to determine in various kinds of job 
situations those variables in the job situation 
which are correlates of the understanding (or 
rather, understandings) between employee 
and immediate Supraordinate. We are prin- 
cipally Concerned, of course, with the degree 
to which such understandings may correlate 
with job satisfaction, that is, attitudes of 


y with group ef- 
) and group eff- 
h understanding 
with group effi. 
iveness. As we 
ness or produc- 
is today, many 
chnological fac- 
of interpersonal 


and immediate 
Supraordinate. The obvious difficulty in at- 


tempting to demonstrate the correlation be- 
tween understanding and group efficiency is 
that of developing an adequate indicant of 
efficiency of the group—an indicant that 
would reflect the psychological costs, to the 


Pietro V. Marchetti 


individual worker, of the work accomplished 
by him. We are thinking here of Ryan's dis 
cussion of cost of work to the individual ge 
his Work and effort (14). We have also k 
mind Barnard’s discussion of the relation i 
efficiency of the group to the individual a 
ciencies of its members, in his Tke [anam 
of the executive (2). Tt is hoped that Gri 
mately such studies might contribute to P 
effective training as well as selection of as 
Sons to serve in supervisory capacities 
various work situations. The results of ao 
studies may also contribute to more emet 
matching of employee and supervisor. she 
may eventually be able to consider in ble 
placement of personnel an additional varia he 
and that would be the degree to which a 
employee might be expected to be oe or 
to a particular supervisor (or the super 
enigmatic to the employee). The object! D 
of course, would be to so match employa 
and supervisor that there might be adequé 
understanding one of the other. 


Procedure 


sects 
present study the aes 
e employees and mc mea 
units and two ee iad 


we 
Tt was the first unit in went i 
It soon became apparected 


. $. e: $ 
estionnaires we had $ aking 


y was 
questionnaires the study col- 


42. More spi 
B, 21; C, 3; 


a9 


123: 2 
pi E 6; F, 4; G, 8; H Kare 
42; J, 8; and K, 14. The Units J and K 

meat markets, jcated 
Measures of Understanding. As indents 
earlier, the measures, generally, are stat? of 


response categori: 

| B chooses the 
dicts that B’s ch 
A’s error for th 


€s, in some order, from re 

at 
ay tog 
oice will be 2, we shall 5% me 


m ~= 
m 


| 


Manager-Employee “Understanding” in the Retail Grocery 


item. All of our understanding measures are 
Just such mean error scores. 
eee four measures of understanding were the 
loves Ch detached understanding of the em- 
stand (MDU), the manager's participant under- 
ae ne of the employee (MPU), the employee’s 
aA re understanding of the manager (EDU), 
the m e employee’s participant understanding of 
Wit ee (EPU). For each employee in each 
we determined an MDU, MPU, EDU, and 
made mt MDU is the mean number of errors 
em l y the manager in his predictions of the 
Ball oyee’s responses to the items of the Tear 
aint for Industry. The MPU is the mean 
predi er of errors made by the manager In his 
a ictions of the responses of the employee to 
rou onnaire we have labeled Supervisory 
eed as Questionnaire. This is simply a short- 
ina Em of an instrument developed by Fleish- 
of thi ). The employee’s responses to the items 
thinks questionnaire indicate how the employee 
ward that his manager typically behaves to- 
criti „his employees. A sample item 1S, He 
icizes people under him in front of others.” 
0 i response categories are: 1. Often; 2. Fairly 
en; 3. Occasionally; 4. Once in a while; and 
of hd seldom. The EDU is the mean number 
la E made by the employee in his predic- 
visor or the manager's responses to the Super- 
rewards a oes Questionnaire with the items $0 
ow s ed that the manager’s responses indicate 
behaves thinks that he, the manager, typically 
ean es toward his employees. The EPU is the 
is number of errors made by the employee in 
The Predictions of his rating by the manager. 
the foponager rated each employee on each of 
ollowing seven characteristics: (1) how the 
oo eS receives orders and suggestions; (2) 
ance het relations; (3) initiative; (4) accept- 
Perso Y fellow workers; (5) promotability ; (6 
Ness nal appearance; and (7) general effective- 
Sug, ch Present job. These characteristics were 
ploy in descriptions of poor, and good en 
two a which were obtained in interviews with 
e managers. 
ach of the eleven units we determined, for 
Of the four understanding measures, & split- 
Odd-even) reliability to which we applied 
Mate Pearman-Brown formula to obtain an et 
Fo of the reliability of a test doubled in length. 


eleya h of our four measures, then, there were 
each y estimates of reliability, one obtained in 
the fe nit. The median reliability estimates for 
Tespercesures MDU, MPU, EDU and EPU are, 


tively, .78, 82, .79, and .83. 
Results 
Our 


Order results are given in the 
of 41, Coefficients of correlation 
e four measures of understanding an 


(a 3 
(b) the manager’s ratings of the employees; 
by the em- 


€ evaluation of the manager 


mie 
5 alf 


form of rank- 
between each 


219 


ployees on the Supervisory Pratices Question- 
naire; (c) the job satisfaction of the em- 
ployees as measured by the Tear Ballot; and 
(d) the efficiency of the unit as evaluated 
subjectively by a member of management 
supraordinate to the unit managers. Each 
correlation coefficient is based upon eleven 
cases; the eleven units ranked in terms of 
the mean MDU of the unit, the mean MPU, 
the mean EDU, and the mean EPU. The 
units were, of course, also ranked in terms 
of the mean ratings by the manager of em- 
ployees in the unit; the mean evaluation of 
the manager by the employees in the unit; 
the mean job satisfaction of the employees; 
and finally the units were placed in a rank 
order of efficiency by the managers’ supra- 
ordinate. 

There were no well founded hypotheses 
about the direction of correlation between 
the understanding measures and the ratings 
of employees nor about the direction of cor- 
relation between these measures and the em- 
ployees’ evaluations of the managers. For 
this reason the so-called two-sided test of 
significance of the correlation coefficient is 
considered appropriate. For eleven cases the 
rank-order coefficient must be .60 for signifi- 
cance at the five per cent level of confidence 
and .74 for significance at the one per cent 
level. We did hypothesize positive correla- 
tions between the measures of understanding 
and employee satisfaction as well as between 
these measures and the efficiency ratings of 
the units. These relationships are suggested 
both by common sense speculation and earlier 
empirical studies. To test the significance of 
these correlations we have used the one-sided 
test of significance. For eleven cases the 
rank-order coefficient must be .54 for signifi- 
cance at the five per cent level of confidence 
and .73 for significance at the one per cent 
level. These results are summarized in 
standard type, in Table 1. 

It quickly becomes apparent from the data 
in Table 1 that there is no significant correla- 


tion between any of the understanding meas- 


ures and either the ratings of employees or 


evaluations of managers by employees. Em- 
ployee job satisfaction, on the other hand, 
does seem to have some correlation with the 
manager detached and participant under- 


w 
Y 
© 


Table 1 


Correlations Between Understanding Scores and (1) 
Employee Ratings by the Manager; (2) Evalua- 
tion of the Manager by the Employees; (3) 

Job Satisfaction of the Employees; and 
(4) Efficiency of the Retail Unit 


Employee Ratings and 


MDU 10 32 
MPU 08 16 
EDU 07 43 
EPU set —.08 
Job Satisfaction of Employees and 
MDU 56 JF 
MPU 53 49 
EDU 43 48 
EPU 3 A2 
MDU/SL 07 a 
MPU/SL -72 74 
Evaluation of Manager and 
MDU -00 —.26 
MPU —.02 oe 
EDU ar 20 
EPU 07 Si 
Efficiency of the Retail Unit and 
MDU «20 43 
MPU 35 62 
EDU BEJ 22 
EPU 52 -96 
MDU/SL 55 68 
MPU/SL 61 63 


standing of the employees. 
between MDU and job satisf 
cant at the five per cent level (using the one- 
sided test) and the MPU and job satisfaction 
correlation very closely approaches signifi- 
cance at the five per cent level. There is but 
the suggestion of correlation between EDU 
and job satisfaction, 

In each unit we were able to identify one 
to three people as the one(s) receiving the 
greater proportion of choices or votes on a 
sociometric questionnaire. The sociometric 
criterion question asked of each employee in 
each unit, was answered by the employee’s 
singling out the one of his fellow employees 
whom he would most like to have go with him 
if he were to be transferred to another unit 
in the same company. In each unit we de- 
termined the mean MDU and MPU based 
not upon all of the employees in the unit, as 
we did originally, but now based only upon 
the most frequently chosen persons in the 


The correlation 
action is signifi- 


Pietro V. Marchetti 


unit. We may think then of the a 
ager’s detached as well as participant uM a 
standing of the sociometric leaders m 1 
unit—MDU/SL and MPU/SL, respective 
MDU/SL has no significant correlation wi 1 
the mean job satisfaction (the mean of a 
employees’ job satisfaction scores, as ee 
nally determined). The MPU/SL, however, 
does correlate significantly with job ig 
tion, almost at the one per cent level. be 
the present study, then, we find that A 
greater the accuracy of the unit managers Í 5 
predicting how they are evaluated by ne. 
employees in their respective units, who a 
most frequently chosen on a sociomet 
questionnaire, the greater the mean job sati 
faction of the unit as a whole. the 
The efficiency ratings of the units by [i 
Managers’ supraordinate correlate SeT 
cantly (at the five per cent level) with MEN 
MDU/SL, and MPU/SL. Their correlate’ 
with EPU closely approaches significance “ 
the five per cent level, . the 
Turning again to Table 1 and noting es 
italicized Coefficients we find that these ee y 
(with two or three exceptions) are relati . 
of the same order of magnitude as the coe in 
cients discussed earlier. The coefficients, © 
italics are based upon the nine grocery U2? io 
The two meat markets are excluded. Oby 
ously, with the change in the number of E 
the values of correlation coefficients for ite 
two levels of confidence, which we if- 
above, do not apply here. The major ts 
ferences between the two sets of coelficl 
are that the correlations of MDU and EL” 
With the employee ratings more closely S 
proach statistical significance with the pe 
markets excluded; and the correlation -e 
tween EPU and the efficiency ratings of yer 
units becomes appreciably greater. eer 
these differences do not appear to be ee 
that they suggest that the grocery and ™, 
market units differ significantly in terms pe 
the relationships explored in this study—, 
etween measures of 
employee ratings, 
by employees, employee ne 
» and ratings of efficiency ° m- 
The sample of meat markets 
bering but two made it impracticable t° igre 
this Statistically. Tt iş proposed to €P 


manage 


k 


Manager-Employee “Understanding” in the Retail Grocery 


on further with additional grocery 
alts market units, preferably including 
er rom other companies in order that 
the Ce) be some test of the generality of 
(aria ae that did emerge—or were 
Our ed by the results of the present study. 
soe ‘a appear to us to suggest that 
this ier er work along the general lines of 
emplo ya warranted. We should like to 
(lowes the same as well as some different 
bid E of understanding between leader 
a ower in different work situations. It 
io seem profitable to use other than a 
alin a of job satisfaction; that is, 
tiaa in of different aspects of job satisfac- 
different morale) in order to determine how 
lates a ag of understanding may re- 
a to the various factors of job 
the tele ie We should also like to explore 
standit sng an between measures of under- 
cienc ng and productivity as well as effi- 
valis of the group. We have certain reser- 
Present about the efficiency ratings of the 
agers? study. We do know that the man- 
that So piRordinate who did the ratings felt 
eretica should not be too influenced by dif- 
Profits. among the units in terms of net 
Profits, Oren are factors determining these 
Petsonne| ich are beyond the control of the 
at high el of the unit. A principal one 1s that 
cide T “a levels of management it may be de- 
erent 0 price merchandise differently in dif- 
ui units, in attempts to determine opti- 
prices. The rater reported as one basis 


Of hiss 

Unfay, judgments, the criticisms, favorable and 
Pan; Orable, made by customers to the com- 
i personnel of the 


diffe, ooout the service and 
Sugg nt Units. The rater also considered the 


ke ians originating with the personnel of 
rae units for the improvement of the 
Was < of the units. Another consideration 
dai Physical appearance of the store—its 
iness and the effectiveness of the dis- 


a 
YS of merchandise. 


Certaj Summary 
ag earlier studies suggesting the pres- 
veral have been reviewed very briefly. 
eel of understanding between 
and me, and employees in the retail grocery 
Corre ane market have been described. The 
lowin ations of these measures with the fol- 
8 variables have been reported: (1) 


221 


manager’s rating of the employees; (2) em- 
ployees’ evaluation of the manager; (3) job 
satisfaction of the employees; and (4) rat- 
ings of efficiency of the units. None of the 
measures of understanding correlated signifi- 
cantly with either the first or second of the 
above variables. Certain ones of the under- 
standing measures did correlate significantly 
with the third and fourth variables. 


Received August 6, 1953. 


References 


1. Barnard, C. I. The nature of leadership. In 
Hoslett, S. D. (Ed.), Human factors in man- 
agement. New York: Harper & Bros., 1946. 

2, Barnard, C. I. The functions of the executive. 
Cambridge: Harvard Univ. Press, 1950. 

3. Cantor, R. R., Jr. An experimental study of a 
human relations training program. Ph.D. the- 
sis, Ohio State Univ., 1949. 

4. Chowdhry, K. Leaders and their ability to 
evaluate group opinion. Ph.D. thesis, Univ. 
of Mich., 1949. 

5. Fleishman, E. A. “Leadership Climate” and su- 
pervisory behavior, Personnel Research Board, 
Ohio State Univ., 1951. 

6. Gibb, C. A. The research background of an in- 
teractional theory of leadership. Aust. J. 
Psychol., 1950, 2, 19-42. 

. Katz, D., Maccoby, N., and Morse, N. C. Pro- 
ductivity, supervision and morale in an office 
situation. Ann Mich.: Survey Res. 
Center, Inst. Social Res., Univ. of Mich., 1950. 

s. Katz, D., Maccoby, N. Gurin, G., and Floor, 
L. G. Productivity, supe 
among railroad workers. 
Survey Research Center, 1 


~ 


Ann Arbor, Mich.: 
nst. Social Research, 


to constructive leader- 
5, 46-53. 

ic ability and social per- 
Ph.D. thesis, Univ. of Mich., 1951. 
‘actors in lead- 
. abnorm. soc. Psy- 


ship. 
10. Luszki, 

ception. 
11. Mass, H. S. 


3. 
igation of certain fac- 


tors related to quality of work-group leader- 
ship. Ph.D. thesis, Univ. of Mich., 1949, 

13. Roethlisberger, F. J. Understanding: A pre- 
requisite of leadership. In McNair, M. P., 
and Lewis, H. J. (Eds.), Business and mod- 
ern society. Cambridge: Harvard Univ. Press, 


1938. 

14. Ryan, T. A. Work and efort. New York: 
Ronald Press, 1947. 

15. Smith, M. Leadership: The management of so- 
cial differentials. J. abnorm. soc. Psychol., 
1935, 30, 348-358. 

16. Stogdill, R. M. Personal factors associated with 
leadership: A survey of the literature. J. 


Psychol., 1948, 25, 35-71. 


12. Meyer, 


Tue JOURNAL OF APPLIED PSYCHOLOGY 
Vol. 38, No. 4, 1954 


An Experimental Evaluation of the Sensitivity of the 
Empathy Test 


Arthur I. Siegel 


Institute for Research in Human Relations, Philadelphia, Pa. 


The Empathy Test (1) is now of interest 
because of the recently reported high correla- 
tions (2) of this test with merit rankings of 
sales-managers’ rankings of automobile sales- 
men (r= .71) and with actual sales records 
of automobile salesmen (r= 44). The test 
correlated (3) as follows with six criteria of 
success for union business agents: record for 
settling grievances and disputes, x = 64; re- 
cruitment of new members, 7 = -60; per cent 
vote received in union elections, r = 38; en- 
forcement of rules and regulations, r = 44; 
leadership tank, 7 = .67; knowledge of su- 
Pervisory principles, 7 = .55, The multiple R 
with these six criteria was .76. 

The authors of The Empathy Test define 
way: “This unique 
‘natural’ leaders, 
and outstanding 
‘put yourself in 
tablish rapport, 
eelings, and be- 


the practical 


rediction of the 
other’s behavior . à 


individuals who are su- 
ic ability are persons who 
understanding and an- 
other people” (empha- 


nsists of three sec- 
n the respondent is 
arity of 14 musical 
waltzes, etc.) with 


asked to rank the popul 
types (polkas, classicals, 
non-office factory workers of the United 
States. In the second section, the respondent 
ranks the popularity of 15 magazines with 
the average American, and in the third sec- 
tion the respondent ranks the annoyance 
magnitude of 15 experiences (a boisterous 
person attracting attention, hearing a person 
chewing gum, seeing a person’s nose running, 


222 


etc.) to persons aged 25-39. Thus, in all n 
the sections the respondent is asked to oy 
not as he would answer, but as the ayetini 
person would perform the ranking, and no 
these rankings an empathy score is age 
Although some low correlations have ne 
been reported by Kerr and his Gowon e 
(1), in view of the high correlations e 
it seemed that some independent penmi 
evaluation of The Empathy Test was pen 
ranted. Assuming the validity of The par 
thy Test and assuming that clinical psycho m 
gists are higher on empathy than epee te 
tal psychologists, then clinical paychole 
should score higher on The Empathy as- 
than experimental psychologists. This nit 
sumption for clinical psychologists does a 
seem to be outside the scope of dente es 
empathy as given by the authors of The 


nt 
pathy Test, and seems tenable to the pres 
author. 


Method 


Form A of The Empathy Test was A 
tributed by mail to 50 “fellows” of the "30 
sion of Experimental Psychology and Ab- 
“fellows” of the Division of Clinical anb ye 
normal Psychology of the American ob- 
chological Association. The sample was 
tained by taking every fifth “fellow iyjsio? 
in the 1951 APA, Directory in the Div, 
of Experimental Psychology and every t° jon 
“fellow” in the same directory in the Div’* 


qa 
of Clinical and Ab 


normal Psychology U?" p- 
total of 50 names in each division werress 
tained. In some instances, no clear ade 
was listed and in th 

ing direc 
A total 


the « of 36 of the forms were returne 34 
ex 


Of these, only ©. 
were completely filled out and one one i 
ived after our data were already analy was 
Our total N for experimentalist pe- 

33. A total of 25 out of 26 of the form 


B 


A 
po 
> T 
m a 

a 


Experimental Evaluation of Sensitivity of The Empathy Test 


Table 1 


Means and Sigmas of Clinicians and Experi- 
mentalists on Empathy Test 


Mean Sigma 
Clinicians 877 FF 
Experimentalists 86.7 18.1 


Pai by the “clinicians” were usable (N= 
h None of the subjects were informed of 
piy purpose of the experiment until after all 
$ the forms used in the comparison had been 
returned, 


f Results 
T Empathy Tests were scored and means 
data standard deviations calculated. These 

are presented in Table 1. 
T mean Empathy Test score for 
p imentalists” was 86.7 while the mean Em- 
ey, Test score for “clinicians” was 87.7. 

nifie difference between the means is not sig- 
less The mean scores obtained would 
ment, both the “clinicians” and the “experi- 
en alists” at the 70th percentile on The 

Apathy Test’s norm for college men. 
tier liberal arts female students score 
Males on The Empathy Test than liberal arts 
Sent ai since 14 female “clinicians were 
Ment 4 test while only one female experi- 
jectior ist” received the questionnaire, the o 
arent may be raised that this sampling di : 

e ial operated so as to bias the scores 0t 
ie aps in favor of the “experimentalists. 
of p vet if this were the case, the variance 
tha, € clinical group should have been greater 
B e variance of the experimental group. 
reverse was true. 
Path of this might indicate that The Em- 
t Y Test either measures something other 
an n empathy, measures empat 
; 


hy plus an- 
Sa i ae . 3 
Ment Variable, or is not a sensitive instru 


“ex- 


223 


An alternative explanation has been ad- 
vanced by Kerr, who kindly reviewed an 
early form of the present paper. Kerr points 
out the possibility that the better clinicians, 
possessing a vested interest, may have been 
more defensive about “going out on a limb” 
and thus the better clinicians may not have 
returned the forms. This sampling differ- 
ential may have acted to lower the empathy 
scores of the “clinicians.” The present au- 
thor feels that this explanation is unwar- 
ranted in view of the fact that neither group 
was informed of the purpose of the research 
until after the forms were returned. If the 
clinicians were unaware of the purpose of the 
research, there was little reason for the bet- 
ter clinicians to believe that they were “go- 
ing out on a limb,” and thus withhold the re- 
turning of their forms. 

In fairness to the authors of The Empathy 
Test, we would like to point out that they 
have never claimed that it will distinguish 
between clinical and experimental psycholo- 
gists. Moreover, the assumption that clinical 
psychologists are higher on empathy than ex- 
perimental psychologists was our assumption. 


Summary 
y Test was submitted by mail 
ntal and a group of 
Assuming that the 
n empathy than the 
Empathy Test did 


The Empath 
to a group of experime 
clinical psychologists. 
“clinicians” are higher 0 
“experimentalists,” The 
not reflect this difference. 


53. 


Received August 24, 19. 


References 
he 
Çer: 7. A. and Speroff, Bi Manual for th 
E KET o 7 P Chicago: Psychometric Affili- 


Empathy Test. 
W. A. Predictive value 
1 automobile sales- 
52, 5, 310-311. 
es of Union 


2. Tobolski, 
of The Empa! 


manship. Ja 0 
3. Van Zelst, R. H. 
Leaders. J- 


THE JOURNAL or APPLIED PsycHorocy 
Vol. 38, No. 4, 1954 


The Validation of an “Indecision” 


Score for Prediction of 


Proficiency of Foremen 


J. P. Guilford 


University of Southern California 


The results to be reported briefly here are 
essentially negative, but perhaps negative re- 
sults should be reported more often than they 
are. On the one hand such a report may 
Save another investigator from entering the 
same blind alley. On the other hand it may 
give another investigator an idea for doing a 
similar study in a modified way which will 
lead to positive results, 

The study is also opportunistic, 
sense that it was not planned in advance but 
was possible as a byproduct of another study. 
The writer happened to have at his disposal 
the answer sheets from more than 400 fore- 
men in an eastern industrial plant, these fore- 
men having taken the three personality in- 
ventories, STDCR, GAMIN, and Personnel 


in the 


It can only 
tories were adminis- 


formation concerni 
the ratings, 
have some reliabilj 


against the 
the present 
dual differ. 
. ` The contributor of the data on whic 
is based wishes his organization 


mous. Iam nevertheless grateful 
the cae available. 


h this report 
to remain anony- 
to him for making 


Norms and validities of 16 test 
ing success of foremen. A Mas- 
University of Southern California 


224 


ences in the tendency of the examinees 3 
use the question-mark response to the ierg 
It will be remembered that the altera 
responses to the items are “Yes,” “?, E 
“No.” Each examinee could be given a = 
according to the number of “?” responses a 
gave. It was hypothesized, subject to p 
tain qualifications to be mentioned later, A 
a large portion of the variance of the | x 
Score represents a personal trait of ao 
The greater the number of “?” responses i 
individual gives, the greater his degree of he 
decisiveness. It was also hypothesized t a 
indecisiveness is an unfavorable trait for m 
men and it was consequently predicted E 
the correlation between this score and 
criterion would be signiñcantly negative. te 
The “?” score comes in the general we 
gory of response-set scores that are recel a 
ing increasing attention as possible object! ag 
measures of personality traits, The meani 
of such Scores, even when they prove to 
highly reliable, must always be questa 
While the first hypothesis about the saat 
is that it measures 8 
veness, there can be other bY 
Potheses, which we will consider. in 
Indecision can enter into the pie 
more than one way. Let us assume first t to 
the examinee is cooperative and attempts 


m x . > f 
answer each item in the way that most gen 
describes himself, F 


between responses 
two conditions, 

know himself ver 
question asked, 

course, re 
ability to 
But when 


He is most likely to W de 

“Yes” and “No” un ot 
One is when he does the 
y well with respect tO a 
Some “p” responses, in- 
Present complete ignorance O! es 
give one of the other respon ef 
there js Partial knowledge, whet or 
ll give the «?” respons? di- 
S will depend upon his rea, 


k erary choice 
vate a more or less arbitrary is 15 
versus his inclination not to do so. 


‘ rou 
the kind of Case whose behavior one WO 


eee 
e ee 


ee 
—— 


“Indecision” Score for Prediction of Proficiency of Foremen 


like to measure by means of an “indecision” 
score, 

Another occasion for wavering is when the 
examinee knows himself well but is himself 
EF the limen for the item; he is on the 
oorderline that to him separates “Yes” and 
ee, This kind of indecision, too, we would 
anes a have included in the measurement, 
with i is probably psychologically identical 
hecti e first type mentioned. In this con- 
te we have the problem of equality of 
exami unity for indecision. Presumably, an 
the on who is near the limen for most of 
ing oe has much more occasion for waver- 
Side an an examinee who is decisively on one 
Tost fe the other of the trait continuum for 
the fs ems. If a “?” score were based upon 
can en that are keyed for one trait only, it 
Venton seen that those who earn moderate in- 
or ‘hei scores have more opportunity 
oe than those who earn scores at 
«n extreme. The relation between the 
be k core and the inventory-trait score would 
or or tean; Since each inventory is scored 
etcor atively independent factors and the in- 
is leat ig of scores tend to be small, it 

ied unlikely that an examinee will be at 
GE positions on all traits. Hence the 
equali unities for wavering are somewhat 

alized if we obtain a “?” score from all 
ae in the inventory combined. Some 
athe Opportunity might well be taken into 
in ia aly however, if we want variations 
decisi Scores to represent traits such as in- 
ern on, freed from involvement with pat- 
d or factor scores. 
cision td occasion for wavering an 
may hans among those examinees 
ey „ave decided to answer the items no 
im ĉe but as they think will make a good 
tee ssion. Here the wavering is with re- 
Soctine which is the more a a 
Without Nisin or “No.” It is pe! ae 
of experi nowledge of the key an with 
Stang rence in taking inventories, ma y : 
the eS Of liminal alternatives arise. / gain 
treste of indecisiveness in which we are in; 
“esbong Would have room for play. The id 
also p S given under this condition shou 
indicate the trait we want to measure 
ree possible meanings of the i ald TEn 
© have been discussed. All of them, it 


d inde- 
who 
t as 


Spong 


225 


has been argued, are potentially contributory 
to the indecisiveness variance we want to 
emphasize in the score. Other meanings that 
do not contribute to this variance include 
cases in which the examinee does not know 
the answer to the question and he should 
therefore legitimately respond with the SKD 
and cases in which he is at or near the limen 
for the item and a “?” response represents a 
correct position for him between “Ves” and 
“No” on the trait continuum. But, as was 
pointed out above, there are individual dif- 
ferences in tolerance of an indecisive re- 
sponse, and this fact makes the “?” response 
contribute to the variance we want. On the 
other hand, there is a possibility that the sig- 
nificant difference here is in the form of will- 
ingness to guess or to gamble versus a caution 
in this regard. This is not logically an as- 
pect of the indecisiveness variable with which 
we are concerned. 

If the lack-of-self-knowledge component of 
the “?” score is appreciable, it would add 
to reliability and also probably to validity 
against the foreman criterion. The greater 
the lack of knowledge of self, the poorer 
should be the chances of success as a foreman 
or of leaders of other kinds. The willingness- 
to-guess component should add to reliability 
but its effect on the «2 score (which is to 
reduce that score) would tend to detract from 
validity against the foreman criterion, assum- 
i e inclined to be cau- 
ing that good foremen ar 


tious in a situation like this. 
Results 


Each of the three inventories was adminis- 
tered as a unit and was given an indecision 
score as a unit rather than factor by factor. 
This was partly to assure a larger range of 
scores and partly to equalize opportunity for 
wavering, as suggested above. It es a 
terest, first, to determine whether individua 
differences in indecision scores are consistent 
from one inventory to another. The inter- 
correlations of scores from the three inven- 
tories provide estimates of alternate-form re- 
liability. 


The frequency distributions of the three 


indecision scores all approached the Poisson 
type. with modes at a score of zero. The 
proportions of zero scores were .41, .48, and 


226 


55. This form of distribution was obtained 
under the pressure of the instruction for the 
examinee to avoid the “?” response. It was 
assumed that the underlying trait continuum, 
however, was one on which the distribution 
in the population is normal. Tetrachoric cor- 
relations were therefore computed. They 
were .73, .75, and .88, with an average cor- 
relation (Fisher-Z method) of .80. Since 
these intercorrelations were fairly high the 
three indecision scores were summed to yield 
one score for each examinee. The reliability 
of such a score should be in the region of .90. 

The correlation of this combined indecision 
Score with the rating criterion was also found 
by means of the tetrachoric r. The sample 
of 405 foremen was divided into two groups, 
one having to do with tools and maintenance 
and the other with production. The scatter 
plots show no signs of non-linearity. The 
validity coefficients were +.14 and — 09 
for the two stoups, respectively. With Ns 
of 119 and 286, these coefficients are statisti- 
cally insignificant, They also differ in sign. 
We may therefore accept the idea that they 
are random deviations from zero correlation 
and conclude that there is no support what- 
ever for the original hypothesis, 

While there is n 
the indecision scor 


worth further 
actical Purposes, 


J. P. Guilford 


however, something would need to be a 
to improve discrimination at the lowest ea 
where discrimination is now very poor. Ha 
there been differentiation among those scot- 
ing zero, we might even find some — 
ship between scores in that range and the 
criterion. It would be more reasonable s 
expect the relationship to appear aman 
scores at the upper levels, however, where 
none was found. Since the reasoning F 
cerning the contributions to variance in t i 
“2” score indicates several possible traits, d 
factor analysis of the score is definitely calle 


for as a basis for intelligible future predic- 
tions. 


Summary and Conclusions 


An “indecision” score was obtained t 
counting the number of “p” responses jes 
items in the Guilford personality inventori® 
STDCR, GAMIN, and Personnel Invent 
The three scores showed an average sae 
relation of .g0, indicating that they meas! a 
much the same trait or traits. A combinatio 
of these three scores correlated + .14 a 
— .09 with a rating of proficiency ot 
men in an industrial plant, whereas a SET I 
cant negative correlation had been predicte > 
While the indecision score indicates s0™ 
thing stable about individuals, it needs tO st 
factor analyzed to be understood and ee 
Conditions that will assure better discrimine” 
tions at the lower levels are needed for * 
Score of practical use, 


Received September 14, 1953, 


id 


Tue Journ. 
Tue Jousnat or Aperen Psycnorocy 


An Approach to Isolating Dimensions of Job Success ** 


Louis L. McQuitty, Charles Wrigley, and Eugene L. Gaier 


University of Illinois 


Type! currently-employed indices of on-the- 
job seg i do not adequately measure 
Gone if because the original job descrip- 
Teddie not clearly depict the psychological 
1Grearch ents of the job being studied. As 
ingly ob progresses, it is becoming increas- 
Sctiptions m that the usual types of job de- 
enough i are neither rigorous nor analytic 
velnen furnish a sound basis for the de- 
is a need ror valid measuring devices. There 
the basic S new methods by means of which 
isolated a poms of job success can be 
The is precisely described. i 
signed a study is the first of a series de- 
ing ae investigate the possibility of deriv- 
rather Pag job requirements by statistical 
Search an by “rational” analysis. The re- 
Scriptio plan calls for factor-analyzing de- 
intersieed of on-the-job behavior obtained by 
ected ‘Abe the peers and supervisors of se- 
chanics, r Force Airplane and Engine Me- 
tier Procedure was guided by the follow- 
orking hypotheses: 


L. Peers and supervisors can select repre- 
sentatives of three categories of me- 
chanics, viz., best, averages and poorest. 
Descriptions of representatives of these 
three categories will reflect individual 
tifferences in psychological variables re- 
i oe to job proficiency. 
ctor analysis of the 
assist in understanding some 


descriptions will 
of the psy- 


by the United 


part ut 
33 (038)-25726; 


s This g 
States K tady was supported in 
Onitored Force under Contract AF 
Tees by the Commanding Officer, Human Re- 
ations esearch Center, Attention: Director of Op- 
» Lackland Air Force Base, San Antonio, 
fo ton Permission is granted for reproduction, pub- 
or the Go and disposal in whole or in part by oF 
ae é A States Government. 
charles gathore wish to express 
WESS for Tce Charles N. Cherry 
c terview ot. assistance in collecting 
se Tynaterials, to Walter A. Cl 
es, and elper for carrying out the sta 
Actor į to Donald R. Shaw for assist 
Mterpretation. 


Si 


appreciation _to 
, ani K. Patricia 
and editing the 
even and Mal- 
tistical analy- 
ance in the 


w 


chological characteristics related to job 
proficiency. 

4. Ability test items can be prepared which 
measure these psychological character- 
istics; individual differences in responses 
to these items will be related to criteria 


of job proficiency. 


If these hypotheses are to prove fruitful, 
the following conditions must be met: (a) 
the descriptions of “best” and “poorest” me- 
chanics should differ significantly; (b) the 
factors deriving from the descriptions should 
be meaningful; (c) the descriptive factors 
should be related to independent criteria of 
job performance, such as their rated job pro- 
ficiency by other informants; and (d) the 
use of the factors as guides in preparing 
ability items should result in tests which are 
more highly related to criteria of proficiency 
than those prepared exclusively by way of the 
job description approach. The first two con- 
ditions and a preliminary investigation of the 
third form the basis of the present study. 


More thorough investigation of the last two 
y later, provided of course 


conditions will follov 
that the results of the present set of studies 


justify it. 
The Descriptive Inventory 


The present paper reports: (a) the prepa- 
ration of an inventory, called the Descriptive 
Inventory, designed to facilitate the descrip- 
tion of mechanics by their peers and su- 
pervisors; (b) a factor analysis of the results 
obtained when this inventory was used by 
supervisors to describe individuals whom they 
had selected as representative of “best,” “av- 
erage,” Or “poorest” mechanics; and (c) an 
analysis of the relations of the items to “best” 


and “poorest” mechanics. 


To obtain items for the inventory, experienced 
mechanics, most of whom had been in super- 
visory positions, were asked to select the “best” 

“poorest”) A. & E. mechanic 


(or &average” Or F 
with whom they had worked within the last two 


27 


228 


ears. A description was then sought of the be- 
ae of this mechanik both on and off the job, 
and the descriptions were subsequently divided 
into separate descriptive phrases. This plan was 
followed because it was believed that an inven- 
tory constructed in terminology familiar to main- 
tenance personnel would be used with more dis- 
crimination by mechanics than would one com- 
posed of more academic and technical phrases. 
Subjects and Procedure. A total of 104 stu- 
dents attending Flight Engineering School at 
Chanute Air Force Base served as subjects for 
the initial phase of this study. Each subject was 
individually questioned in an interview divided 
into two separate sections: (a) a free-response 
phase, in which the subject was asked simply to 
describe a fellow mechanic selected by him to 
represent one of the three categories of profi- 
ciency; and (b) a “structured” phase in which 
comments were elicited in response to specific 


These escriptive phrases thus obtained were 
extracted from the typescripts of the recorded 
interviews and 


, assembled into a pool, which 
numbered Jn all some 15,000 items, From this 


ndom basis 


| y three psycholo ists and 
five mechanics to the end that: GD pote 


roars poe always those of the interviewee 
each phrase was in the resent te 
items on wh y AN 


nated as ambiguous: 
phrases whose meanings were dependent upon 
context were rewritten to make their meanings 
clear when used in isolation, 


First Pilot Study 


. Upon completion of the editing, 235 items 
remained out of the original sample of 264. 
These items were assembled into an experi- 
mental inventory in which each item re- 
quired either a “yes” or a “no” answer, The 
inventory was administered to Air Force su- 
vervisors in order to: (a) obtain comments 
as to the meaningfulness and adequacy of 
coverage of the phrases; and (b) to secure a 
preliminary indication of the predictive utility 
of the phrases, To fulfill this latter aim, chi 
square values were computed for each item 
in order to determine whether or not it dis- 


oe 


Louis L. McQuitty, Charles Wrigley, and Eugene L. Gaier 


sa aa > nics 
criminated significantly between mecha 
selected as “best” and “poorest.” 


After these data had been obtained, the M 
were considered one at a time with respec gni- 
(a) the mechanics’ comments; (b) the ma a 
tude of the chi square values; and (c) aa? 
portion of subjects answering each ae re- 
ternative. Three judges decided whether 90% 
tain, amend, or reject items. Items with a ative 
or more response for either answer alten oa 
were rewritten to lessen this percentage Re 
ever this appeared possible; otherwise i d to 
rejected. Items which supervisors repor! fer 
require information that they did not ae un- 
rejected, and those regarded as difficult to 
derstand were amended, +. phase 

In all, 35 items were eliminated in this P ete 
of the study. The 200 remaining phrases list 
assembled in random order into a chec Al- 
designated as the Descriptive Inventory col- 
though the entire inventory was used in t were 
lection of data, only the first 120 ees) Jater. 
analyzed in this study, for reasons state! 1-2. 
Examples of the items are listed in Tables 


Use of the Descriptive Inventory 


to 
Our next immediate purposes were: On 
isolate relatively independent clusters ° ese 
terrelated descriptions; (b) to interpret Jimi- 
psychologically; and (c) to make a pre jo 
nary investigation of their relation to J 
proficiency, 


. vi ad 

Subjects. The Descriptive Inventory ya at 

gp ustered to 428 Flight Engineering studen" 4 
hanute Air Force 


Base. Each subject Me- 
completed a course in Airplane and Engine suc 
chanics and had had at least six months “main 
pervisory experience. In length of line m six 
tenance experience the subjects ranged men be 
months to more than 21 years, with a mec! 
four years, jnis- 

Administration. The inventory was ae of 
tered in small 8roup sessions (12 to 25 mations 
be in length. The instru were 
Printed on the face sheet of the booklets "on 
read to the subjects before they began wo! iver 
the inventory. “All of the respondents were 6 
mplete the items. om” 
Analysis, In order to reduce 9s 
L » the Descriptive Inventory by 
into two parts for factor analys 
ortened square root method develop hur 
y and McQuitty 3 (a modification © nt PX 

method). In the prese?'ems 

t ported for the first 120 ato 
(Since the 200 items in the Inv phe 
° Wrigley, Charles itty, Louis T peos 
Square. Root Method or ee ides: aap 
amination and a Shortened Procedure (Manus 


Wrigle: 
stone’s 


Per, results are T 
only. 


j 


An Approach to Isolating Dimensions of Job Success 


Were arranged in random order, there should be 

no significant difference in the type of items ap- 

ee in the two parts.) Phi coefficients were 

> to measure correlations; these and the fac- 

ee loadings were calculated on IBM equip- 
ent, using punched card methods developed by 
elper.* 

In the s ; uni 
am ae Square root factor analysis, a “pivot 
Eua was selected and factor loadings were 
a ated to reduce the correlations or residual 
ae for that variable to zero. The pivot 
$ alee werp selected with the aim of enhanc- 
tive e likelihood of getting predominantly posi- 
whi E loadings, and of obtaining factors 
was. are conceptually clear. The pivot variable 
ines always the one with the highest absolute col- 
viti sum. The same procedure is then repeated 
sult. another “pivot variable.” The method re- 
at = a orthogonal factors, with all factor axes 

ight angles to one another. 
y Tn addition to the factor analysis, a pilot 
<img analysis was also completed. Using 
“p y the 204 inventories which described 
2 . 
ci est” and “poorest” mechanics, phi coeffi- 
Bec were computed between each item and 
TS best-poorest dichotomy. These phi co- 

Piet are here called criterion cotrelations. 
th total of 18 factors were extracted from 
120 variable matrix. By this time, the 
we of decreasing returns appeared to have 
hes reached, as shown by the drop in rela- 
t e Proportions of variance accounted for by 

Š 15th, 17th and 18th factors; moreover, 
a factors became less obvious in psycho- 

i meaning. 
tir n order to insure that no major group fac- 
allt had been omitted, a list was prepared of 
est Ta items not appearing within the ten high- 

ify Oadings of any one of the first 18 factors. 
itther pivots were drawn from this reduced 
This procedure. was designed to guaran- 
tre some axes passed through that por- 
ous] of the hyperplane which had not previ- 
6 an been traversed. The value of the pro- 
$ a in the present study was demonstrated 
ord he fact that the next factor—the 19th in 
in i of extraction—proved to be the sixth 
ess € order of variance. The next four were 
in, encouraging. Consequently, the factor- 

was discontinued at this point. This 


ad 

€ a total of 23 factors extracted. 

4 

squar Be M. M. Punched-card procedures for 
Toot factor analysis (Manuscript). 


229 


Item Validities 


It will be of interest first to consider the 
phi coefficients between the individual items 
and the “best-poorest” dichotomy for the 204 
mechanics classified in this fashion. These 
criterion correlations range in magnitude from 
87 to .01 with a mean of .48. Although 
these results show some very substantial rela- 
tionships between the descriptive items and 
the “best-poorest” criterion, they cannot, of 


Table 1 


Descriptive Inventory Criterion Correlations for Items 
with High Predictive Value (¢ > .70) 


Item Phi 


No. Coefficient Item 


Items characteristic of good mechanics 
He makes sure he does a good job 


68 868 

ii 835 When he does a job you know it will 
be done right ` 

HE .812 He deserves a promotion 

97 .800 Tf you leave him to do a job, you can 
always be sure he will get the job 
done 

107 .782 He is good at working on the plane 


.780 He can show you how to do the job 


right 
772 He isa good man on any job 


A ‘772 He knows his stuff 
119 .768 He seems to take pride in his work 
47 .762 You don’t have to worry about tell- 
ing him what to do all the time — 
72 759 He tries to find better ways of doing 
things 
106 739 His ambition will pay off 
118 ‘721. He gives good cooperation 
49 712 He will straighten a guy out and ex- 
plain things to him 
6 .703 If he were a crew chief, he would 
work right along with his men 
Items characteristic of bad mechanics 
12 —.778 You wouldn’t feel safe unless yov 
i checked behind him ’ 
27 —.753 Most guys with that much experi- 
ence know a lot more than he does 
10 —-750 He works in a sloppy way 
38 —.748 He isn’t a very careful worker 
41 —-744 He achieves his aim in the wrong 
way 
69 —-741 He is kind of slipshod in his ways 
30 —-714 He doesn’t have any sense of respon- 
sibility 
‘Ed v, Research | 


ai 
ING COLLEGE | 


230 


course, be accepted as final evidence that the 
descriptions are related to proficiency on the 
job. They indicate, however, the features re- 
garded by the supervisors as significant. 

Items with highest criterion correlations 
(see Table 1) are mostly generalized descrip- 
tions of behavior on the job, e.g., “He makes 
sure he does a good job”; “When he does a 
job, you know it will be done right.” These 
characterizations, however, give little detailed 
information as to the psychological com- 
ponents of job success. The advantage in 
carrying out a factor analysis of the items is 
that more analytic dimensions are thus de- 
veloped. i 

The items with the lowest criterion correla- 
tions (see Table 2) are less job centered in 
their orientation, and deal with such traits as 
drinking habits, social demeanor, truthfulness, 
appearance, etc. 

The items with high criterion correlations 
agree for the greater part with the items with 


high loadings on the first factor. This is par- 


20 items in 


the 20 highest 
the criterion. 


ent psychological 
pilot criterion. 


Interpretation of Factors 


As computational methods improve and the 
analysis of more variables for larger numbers 
of subjects becomes practicable, factor ana- 
lysts will probably become accustomed to 
dealing with smaller loadings. In this study, 


Louis L. McQuitty, Charles Wrigley, and Eugene L. Gaier 


Table 2 


ainas 1s ee cms 
Descriptive Inventory Criterion Gorteletions for It 
with Low Predictive Value ( < .20) 


Item Phi 
No. Coefficient Item 
54 014 He is of just average appearance | 
55 026 Tf there is something he likes to fe 
he does it faster than anyone C'S 
would 
61 179 He doesn’t lose his temper back 
37 192 He would give the shirt off his ba = 
24 195 He associates with fellows like him 
self ee 
117 199 He is of above average appearance 
32 —010 He is quite young for his rank 
88 —.028 He doesn’t drink +. off-duty 
26 ~.058 He likes to drink on his off-cu" 
hours m 
62 —.072 Sometimes he gets “T’d off” rt 
29 —.077 His basic training was rather shor’ 
58 —.088 We have had a couple of Ree 
92 —.148 He appears very rude to a strang in 
9% —.150 He hasn't had too much time 
Service P ther 
108  —.193 He doesn’t care to mix with ° 
people 


. be 
loadings as low as .10 usually appear vant 
quite meaningful, and not at all poate a 
with the general interpretation of the a 
The twelve factors accounting for the s 
variance are reported here. The arlof 
Squares of loadings for each of the fa 
are presented in Table 35 f the 

Tables 4 through 15,° one for each fh the 
twelve factors, report (a) the items wit osi- 
ten highest loadings for each factor, ice 
tive loadings first, followed by the me tors 
ones; (b) the interpretation of each ie 
(c) the phi Coefficient between the su r 

: arge! 

5 In the use of the square root method, tho nefore 
factors tend to, but do not necessarily, appea" facto"? 
the smaller, In Presenting results here, the tion 1 
have been rearranged in order of contribut ha 
variance, and to conserve space, smaller factors file 

d. The factor loadings are 
S at the Training Research ited 
. si 
‘ith the A515 and 16 and 17 have been deP2 det 
with the American Documentation Institute. publi; 
Document No. 4248 from the ADI Auxiliary ary 0 
cations Project, Photoduplication Service, Li in ad- 
Congress, 2 remitting k 6 xi 
vance $2.25 for 35 i i or $5.00 fo hic? 
in, photocopies Make microfilm or $5 to s 


copies, ake checks payable 
Photoduplication Service, — MY Congress. 


et a 


19 14 1.6644 
7 Total sum of squares 61.1084 


An Approach to Isolating Dimensions of Job Success 231 
Table 3 
Square Root Factor Analysis: Sums of Squares of Factor Loadings 
È 
F; m , Sum of Sum of 
One, Hi Factor in Squares of Factor in Factor in Squares of 
Size. Order of Factor Order of Order of Factor 
Extraction Loadings Size Extraction Loadings 
; 1 24.7028 14 2 1.5272" 
3 2 2.8732 15 8 1.4942 
i 4 2.3592 16 20 1.4780* 
F 12 2.3522 17 3 1.4096 
1 2.0080 18 9 1.3614 
a 19 1.9102* 19 16 1.2952 
$ 6 1.8269 20 21 1.2469* 
be 10 1.7407 21 17 1.1953 
9 5 1.6646 22 18 1.1584 
23 15 1.0812 


7 1.6302 
B 13 1.5756 Contribution to variance 50.92% 
23 1.5530* 
* D, - - z 
cartieg YO! for these factors were selected from a reduced list of variables, viz., those which had hitherto not 
\ Very high loadings on any factor. 
sel , ' 
r Scted as “best” or “poorest” and the item Average 
SPonse; he Factor Criterion 
item in ‘ ang (d) the code number of the No. Factor Title Correlation 
weri the inventory. Loadings as ae Aspects of drive and initiative. 
number, accepted here because of the large 1. Sense of responsibility .74 
428) of both variables and subjects (N= Be Willingness for work 50 
Tab 4. Laziness 4 
ive Į le 16 lists all 120 items of the Descrip- 8. Industriousness 7 
adi nventory. Table 17 gives the 12 factor Aspects of practical efficiency. 
ae for each of the 120 items. Tables 6. mage ie knowledge m 
17 ar ited i effectively : : 
footnote 6 are also deposited in ADI. (See ss Pedea] workmanship 66 
16. Lack of craftsmanship 70 
Aspects of knowledge and intellectual ih 


Relati R” 
F on s 
ion of Factors to Criteri eee reaching capacity 1 


T 7 
to he problem remains as to whether all fac- 


9. Memory i P 48 
escri i re related 15. Intellectual capacity a 
bed by the mechanics are relat 5 Tob knowledge “47 


to 
pn criterion, Results may be summarized A B 
tion penting the average criterion correla- Aspects of socia Se A 
ton for the 10 iteme with highest loadings in 12+ Bersonal pleasantpess 35 
be ma ctor, Those factors which appear to o ‘Anti-sociability 35 
havior uring somewhat the same area of be- Aspects of interest and m orale. 
ave been grouped together. >. Interest in aircraft main- m 


Ari, €se results appear quite clearcut. The tenance oo 
oH fie initiative shown by the mechanic, 13. Lack of morale 
OH the ONE hand, and his practical efficiency, gspects of character. " 


Þilo de Other, are most closely related to the A Weakness of character F 

ie a a The geyen. facii ae A Seeon control “10 

i A ack of seli- 

hig st under these two headings have the 20. 

fact average for criterion correlations. The Other factors. | 
Jnexperience al 


i Ors ` a " 
\ Chanie €aling with social manner of the me “= Tendency tO mediocrity 
t > his interest in aircraft and in the Air a g 


232 


Force, and his intellectual powers are less 
highly related to the criterion. His drinking 
and money habits, and his ability to control 
his temper have little or no relation here. 

The factors which account for more vari- 
ance tend to be more highly related to the 
pilot criterion, as shown by the fact that the 
rank-order correlation between factor vari- 
ance and mean criterion correlation for the 
factors, using all 23 factors, is .51. 


Discussion 


Interest and Motivation in the Supervisors’ 
Accounts. In discussing these results, we 
must bear in mind that these were the ratings 
made by supervisors, and their judgments 


may reflect their own conceptions rather than 
actual job performance. 


judgments, interest and motivation appear as 
the principal factors in the descriptions of 
“best,” “poorest,” and “average” Aircraft and 


principles of mechanics. 


The Place of Mechanical In ormati 
Aircraft and Engi N i =a 


nical knowledge. 
cal learning is nec 
to service an aircraft, but this study has re- 
vealed neither the nature nor the a 
this basic information that i 
Successful mechanic. 
reference to lack of 
Scriptions of bad mechanics. Hence, we may 
assume that: 

than has g 
case; (b) 
in giving to all 


Louis L. McQuitty, Charles Wrigley, and Eugene L. Gaier 


neglected in their descriptions a significa, 
characteristic in which mechanics differ. In 
any case, the restriction of this study to men 
already trained minimized the importance A 
mechanical knowledge, and consequently en 
phasized differences in motivation and pe 
sonality. Even if differences of interest i 
willingness do not give the whole story, t E 
factor analysis has made abundantly Re 
that they are, at least, the primary variab e 
in the descriptions by supervisors of mechan: 
ics whom they selected to represent different 
levels of efficiency. In other words, the F 
sults support the hypothesis that getting go 

Aircraft and Engine Mechanics is not entre 
a problem of accumulating knowledge; it 1° 


* 7 i bia 
at least in part, a matter of motivation, Ï 
terest, and morale. 


Summary ict 
Before additional tests designed to pe - 
Success of mechanics are written, specific A 
potheses are needed outlining the dimensie 
which enter into job proficiency. The pres 4 
Study attempted to isolate some of these “A 
Potheses by: (a) obtaining descriptions i ne 
Supervisors, in their own words, of A inp 
and Engine Mechanics (selected to wan 
Proficiency); and by (b) factor-analy2iné a 
compendium of these descriptions. A sa 3 
root factor analysis of 120 of these desctiP 
tions resulted in the following hypotheses, 
further study. jnde- 
1. There are a large number of rather a 
pendent dimensions of behavior related to J 
Proficiency, Of the 23 factors extract 
Practically all of these were found to be 


as 

lated to differences in mechanics selected 5 

representative of “best” and “poorest jbed 
r 


Performers by the supervisors who desc 
them. 23 

2. The six most clearly defined of i of 
dimensions were asserted to be: (a) sers in- 
responsibilit THA 
tenance; (c i f 


ness and lack of initiative: e) weaknes* ve 
character; A leo 


and (f) fail se know 
effectively, (f) failure to u ribe 
(i 


It was concluded that supervisors dee em 
trained mechanics who are selected by sof 
to vary in Proficiency much more in te” the 
interest and motivation than in terms ° 
amount of job knowledge possessed. 

Received August ó, 1953. 


Y; (b) interest in aircraft 
) willingness for work; (d 


Tue Jou: 
È RNAI AP. 
Vol. 38, Ne igs PsycHo.ocy 


Th ; 5 
e Analysis of an Experimental Job Evaluation System 
Applied to Enlisted Naval Jobs * = 


E. J. McCormick 


Occupational Research Center, 


Purdue University 


and 


Willard E. North 


6563 Rescarch and Development Group, 


The 
the ca been a growing recognition within 
ness of i services of the potential useful- 
Military j Fee oa for various purposes. 
tribute N evaluation, for example, can con- 
qualitative aS more adequate differential 
Services, to ocation of personnel among the 
grams ‘and the development of career pro- 
Services of to the equalization across the 
types of r grades and ranks for comparable 
> iy sii and duties. 
ag investigation is a pilot study 
Jobs with; the job? evaluation of enlisted 
Purposes = the United States Navy. Specific 
ing the te the study were those of identify- 
m job J i ors which contribute to differences 
importan ues, and of determining the relative 
ce of each such factor. 


1 F: 
pi This 


relati 


arti i 
Aara i based on a study carried out by 
A nder ea Research Center, Purdue Univer- 
a the Off provisions of a research contract be- 
Thatch Bound of Naval Research and the Purdue 
ang Views ex ndation (Contract No. N7onr-39410). 
Nay 0: not. pressed herein ‘are those of the authors 
y Depen ly represent the views of the 
Drege, ut i 
Qucistion hors wish particularly to © 
Dialifications Mr. D. G. Price, 
vision, Bu, Research Branch, Personn 
ation pee Navy Personnel, for 
2 i investigation, ` arrangements for ma 
nth “job” does not have 
ie ae Navy as it does in in 
ship gee zA sed in this article for pu 
Seya, OAN Because of 
der al di 
in, diff 


of 
ny phases 
a specific Con- 
dustry and busi- 
rposes of con- 


must perform 
rent times un- 
d in operat- 
man usu- 


e 
ees regular duties 
related to his Navy rating); f 
S th: Quarter 
at he performs under specific shipboard 
ee ncies or dur- 
N rations. The study reporte in thi 
b e study rep 
S the Tie largely on what might be 
ine” duties of enlisted personnel. 


task, e 

Kaso tha 
sgn ert 
conga 


2 


Chanute Air Force Base, Illinois 


Experimental Procedures 


The experimental procedures basically in- 
volved the identification, for one representa- 
tive sample of enlisted jobs, of the factors 
(and of their statistically determined weights) 
which gave the optimum degree of relation- 
ship with criterion values, and the cross vali- 
dation of the results with a second repre- 
sentative sample of jobs. It was hypothesized 
that if a particular collection of factors, with 
their appropriate weights, would predict job 
values with two independent samples, a job 
evaluation system structured on such results 
would be of general applicability to the en- 
tire population of enlisted naval jobs. The 
criterion consisted of rankings of the jobs by 
experienced naval personnel on over-all diffi- 
culty and responsibility. The evaluations of 
the jobs on the various factors were made by 


Navy job analysts. 


The Samples of Jobs 

investigation the 
“population” of nava sidered to be 
those defined in The Manual of Enlisted Navy 


Job Classifications (4)? 

Job Dimensions Considered. In order to ob- 
tain representative samples, the following four 
job dimensions were considered: (1) Job group 
(14 groups representing different areas, such as 
quartermaster jobs, electronics jobs, etc.); * 


2 The following & ere excluded from 
exclusive emergency service jobs (jobs 
that typically exist as 
of full mobilization) ; 
one rating; and spe- 
cialists (job specialties, as divers, for which 
some alified, and which they may 
be called upon to now and then in addition 
to their ri The total population of 
jobs after 

4 The Manual of Enlist 
classifies jobs in 
poses of this inv 
vided into 14 groups. 


33 


groups of jobs w 


restricted scope 
ditions 


such 


ed Navy Job Classifications 
For the pur- 
they were di- 


234 


Job levels (three levels, namely: Basic, Journey- 
man, and Supervisory); (3) Branch of service 
(Aviation versus Non-aviation); and (4) Job lo- 
cation (where the job typically occurs namely: 
Shore, Shipboard, and Shipboard-Shore).5 

Selection of the Two Samples. The experi- 
mental and hold-out samples were then individu- 
ally so selected that each sample included per- 
centages in each category of the four dimensions 
which approximated the corresponding percent- 
ages in the total population. For the later pur- 
pose of deriving criterion values, these two sam- 
ples were combined, making a total of 103 jobs. 
Sixteen “extra” jobs were then added to these 
samples for purposes to be described later, mak- 
ing a total of 119 jobs. 


The Criterion 

The “validation” of industrial job evaluation 
systems usually is in terms of the extent to which 
a given system results in a satisfactory degree of 
relationship with prevailing wage or salary levels. 
Since naval pay for various grades and ranks is 


) i i » and a 
These instructions asked the 


ë This system included the f, vi a 
Work Knowledges Required oig factors: (1) 
Hazards; (3) Guidance 


7 and/or Supervisi 
ceived*; (4) Responsibility for Sy lie: EEG 
ment*; (5) Non-hazardous inp at Equip- 


Work iti 
(6) Physical Effort Required* ; ae sonditions; 
for the Safety of Others* ; 
ud Command Responsibility* ; 
a azards and Hardships; (10) Physi kill* + 
(11) Mental Demand; (12) Military aaa nae 
Conditions*; (13) Attention. Eight of 
tors (those marked with an asterisk) were 


E. J. McCormick and Willard E. North 


The Job Analysts Who Served as Evaluators. 
The jobs were evaluated by experienced a 
analysts in the naval service. The experimen n 
and holdout samples were evaluated, at diren 
times, by 32 and 13 analysts respectively; us ‘ti 
analysts were included in both groups and eva 
ated both samples. vas 

Method of Evaluation. Each analyst r i in 
given definitions of the jobs in the Samy 
question, a set of definitions of the thirteen p 
perimental factors, a set of instructions, an ed 
set of record sheets. The instructions provi le 
for the analyst first to select, from the sae 
in question, the definition sheets of the te 
Which he felt he could rank relative to other J s 
on the various factors. Instructions provi 0 
then for ranking these selected jobs on ae 
the 13 factors. For this purpose the nan 
Comparison” system described by Bittner es, 
Rundquist (1) was used. This system eee 
in general, for the division of the items into sub- 
groups, the ranking of the items within each sbe 
sroup, and the subsequent “merging” of the 
groups. 


Results 
I. Criterion Scale Values 


03 
In order to derive scale values for the a 
original sample jobs as such, the rankings 


. Ge 
the judges of the 16 “extra” jobs were de 


garded; this was done for each judge DY H 
signing ordinal rank orders to the sal m 
jobs in the order in which he ranked ther’ 
exclusive of any of the “extra” jobs which 
had also ranked. ach 
_ Original Criterion Scale Values. Since A it 
Judge ranked only part of the sample o e- 
Was necessary to take this into account i? 
riving criterion Scale values. A method ro- 
scribed by Guilford (3, pp. 256-257), hese 
Priate to such Situations, was used. be 
scale values, numerically, ranged ill 
(high) to 3.2037 (low). 

Consistency of Rankings by Judges: _ 
order to get an estimate of the consistent? s 
the rankings of e 


wanking 
ach judge with the ra? „det 
of the entire gro 
correlation 


mple 


; ank d 
up of judges, a rank each 


ranked and 


in the complete array, when the scale we 
of the jobs he ranked were put into 0" ‘on 
tank sequence, These rank order corre!@ "g6. 
ranged from .60 to .94, with a median ° ig 
While all of these rhos were statistical 


Experimental Job Evaluation System for Naval Jobs 


ape (at better than the one per cent con- 
of an level}; a VAS decided, in the interests 
the a erion stability, to drop the rankings of 
ae judges with the lowest critical ratios. 
if th cient analyses were made on the basis 
Gee rankings of the remaining 26 judges; 
: median rho was .87. 
TEN, such consistency does not provide 
cite rable proof of the validity of the 
the u ents of job values, it lends support for 
the is of such judgments as a criterion, 1n 
j sence of any “true” criterion of naval 
Job values, 
ae Ye Job Samples. In addition to analyz- 
oe e lity of the criterion rankings 
also md e several judges, an analysis was 
individ a of the consistency with which the 
this a ua sample jobs were ranked. While 
tail Se will not be described in de- 
Were suffice it to say that the 15 jobs that 
drop aged with the least consistency were 
w Ae from the samples, and were replaced, 
imen possible, with “extra” jobs with similar 
Were sion characteristics. Such replacements 
Sree, possible for all the jobs dropped, 
to 5 er, and the two samples were reduced 
hese es 37 jobs respectively. The jobs in 
e o we groups (giving a total of 95) were 
Various | used later in the analysis of the 
ie evaluation factors. o 
Scale ya Criterion Scale Values. Criterion 
gs € Values were then recomputed using these 
Beg anke by the 26 judges men- 
I 
. Analysis of Factor Evaluations on Ex- 
ri berimental Jobs 
ops att Rank Orders on Individual Fac- 
ere y he factor rankings of the job analysts 
ers bos first in deriving tentative rank or- 
the 13 the 58 experimental jobs on each of 
Guilfor ctore: The method presented Dy 
Scale nee (3, pp. 256-257) for use m pais 
rankina ues from several sets of incomp ge 
tio 8s involves the intermediate computa 
abili “probability” values. These prob- 
do Y Values have the same rank orders as 
Used y, final scale values, and were therefore 
tivi as the basis for determining these tenta- 
rank orders 
py liability of E , 
he oll y of Evaluation by Jo 4 
owing reliability analysis Was 


b Analysts. 
made 


235 


individually for each of the 32 job analysts. 
The jobs which each analyst ranked were first 
extracted from the complete array of the 
tentative rank orders on each factor; these 
selected jobs were then assigned ordinal rank 
orders on each factor in the sequence in which 
they had been extracted. For each factor a 
rank order correlation (rho) was computed 
between the ordinal rank orders of the jobs 
which the analyst ranked as extracted from 
the complete array of jobs, and the rank or- 
ders of those same jobs as he had ranked 
them on the factor in question. 

The rhos for each analyst on the 13 fac- 
tors were then converted to Fisher’s z values. 
The 13 z values for each analyst were then 
averaged, and these averages were then re- 
converted to rho correlations. These aver- 
age rhos ranged from .60 to .89 for the vari- 
ous analysts, with a mean and median for all 
analysts of .81. 

The average rhos for the individual ana- 
lysts were then subjected to a statistical 
analysis to determine the extent to which 
they differed from those of the other ana- 
lysts. The seven analysts whose average 
thos differed most from those of the remain- 
ing analysts were considered as candidates 
for being dropped for the subsequent analy- 
ses. Four of these analysts were dropped. 
The other three, however, had each ranked 
certain jobs which in turn had been ranked 
by limited numbers of other analysts; it was 
therefore considered desirable to retain the 
evaluations of these three analysts. The av- 
erage rhos of the 28 analysts retained ranged 
from .71 to .89, with a mean (computed from 
Fisher’s z values) of .82. While these values 
should be considered as being approximations 
rather than as precise indexes of the reliability 
of the analysts, the general level of the re- 
liability compares rather favorably with that 
which is typically obtained in industrial job 


evaluation studies. 
The rhos of the 28 analysts for each of the 
vere then averaged, 


13 individual factors W 
using Fisher’s z values. These average rhos 
ranged from 64 to .88 for the various fac- 


tors, with an average tho for all factors of .82. 

Final Scale Values on Individual Factors. 
The rankings of jobs on the 13 factors by the 
28 job analysts were used as the basis for de- 


236 


Table 1 


Correlations of Factor Scale Values with Criterion Scale 
Values for Final Experimental Sample 


pate r Factor Name 

1 954 Work Knowledges Required 

2 -193 Inherent Job Hazards 

3 —.854 Guidance and/or Supervision Received 

4 -631 Responsibility for Supplies and Equip- 
ment 

5 —.033 Non-hazardous Working Conditions 

6 —.139 Physical Effort Required 

7 -248 Responsibility for the Safety of Others 

8 629 Guidance, Supervisory and Command 
Responsibility 

9 -048 Potential Combat Hazards and Hard- 
ships 

10 420 Physical Skill 

11 -156 Mental Demand 

12 .089 Military and Working Conditions 

13 505 Attention 


In order to determine the factors and their 
weightings which 8ave the optimum degree of 
telationship with the Criterion scale values 
the Wherry-Doolittle test selection method 
was used. This method is described by Gar- 


Shrunken Multiple Correlations with Crite, 


rion 
and b Weights, for Fin: 


al E; 


E. J. McCormick and Willard E. North 


rett (2, pp. 435-558). The results OO 
analysis are given in Table 2. T hich 
shows the factors in the sequence in va 
they were selected, including the shru = 
multiple correlation (R) with the age: al 
each of the selected factors along wit also 
Previously selected factors. This ble 
gives the Beta weights and the ae fac- 
derived “p” weights for the individua R 
tors selected. The first five factors apt 
gave an R of 968, The addition. of the ting 
factor caused no increase in the R, indica ave 
that the first five factors by themselves 4 the 
the optimum degree of relationship wit ulti- 
criterion scale values. The unshrunken m F 


wit 
Ple correlation (R) of these five factors 
the criterion was -970. 


york 

It will be observed that Factor 1 ae 
Knowledges Required) by itself gave cating 
relation with the criterion of .954, nd 
that this single factor accounted for serol 
high proportion of the variance in cti tere 
scale values. Factor 3 and Factor 2 €n with 
into the prediction of criterion yau tion- 
negative weightings. The negative e 
ship of Factor 3 (Guidance and/or dable: 
vision Received) jg readily understar tot 
but it is interesting to note that this ae 
came through” while Factor 8 (Gu sity) 
Supervisory and Command Responsibi s 2 
id not. The inverse relationship of Fac wi 
(Inherent Job Hazards) is consistent alua- 
the typical findings of industrial job €V 
tion studies, 


s 
III. Cross Validation with Hold-out Job le 


. ji j se 
Derivation of “Predicted” Criterio a wg 
Values Using Selected Factors. The fiv 


Table 2 


Factor Name 
Work Knowledges Required 


Guidance and/or Supervision Received 
Potential Combat Hazards and Hardships 
Inherent Job Hazards 


Responsibility for the Safety of Others 
Physical Effort Required 


of Factors in Order of their Selection, with Beta 
“xperimental Sample 
Factor H y 
No. R Weight ae 
5: 

1 954 7664 “3508 
3 963 —.2443 ~ 9631 
9 964. 0645 "242 
2 966 ~.1359 ~ "4010 

7 968 .0990 ` 

6 .968 


e 


| 
| 


Experimental Job Evaluation System for Naval Jobs 


tors identified as giving the optimum degree 
of relationship with criterion values of the ex- 
n al jobs were used in computing “pre- 
ii, ang values of the 37 hold-out 
na cs he first step involved in this process 
holds at of computing scale values for the 37 
the Sue Jols on each of the five factors, using 
weight rs previously described. The “h” 
rated F kor these factors were then incorpo- 
derived. a regression equation, along with the 
tain a constant (K = .2418) in order to ob- 
Cor tg Ed criterion values. 

vite "relation between Predicted and Actual 
ea Values. The predicted criterion 
hen alues for the 37 hold-out jobs were 
mine a telata with their previously deter- 
C RE criterion scale values. This 
Such a ton was .937. This correlation is of 
the sel magnitude that it gives assurance that 
stanti ected factors account for a very sub- 
ial proportion of the criterion variance- 


Conclusions 


ee following conclusions seem warranted 
tion; e basis of the results of the investiga- 
A The criterion ranking of the sample jobs 
i oer of representatives of the naval 
them reflect fairly stable concepts among 
listeq with Tespect to relative values of en- 
Gian jobs; in the absence of any true 
j ction of naval job values, such reliable 
etion may well be accepted as a cri- 
or use in job evaluation research. 
analyse” rankings of the sample jobs by 
job |05 on the 13 factors in the experimenta 
‘valuation system resulted in a satisfac- 


tor 
gree of reliability. 


job 


237 


3. Five of the 13 factors accounted for a 
very large proportion of the variance in cri- 
terion scale values. For the experimental 
sample, the shrunken multiple correlation of 
these five factors with the criterion was .968. 
For the hold-out sample, the “predicted” cri- 
terion scale values (predicted by the use of 
a regression equation) gave a correlation of 
937 with the actual criterion scale values. 
The third, fourth, and fifth factors identified 
by the Wherry-Doolittle test selection method 
(Factors no. 9, 2, and 7) added only slight 
increments to the shrunken multiple correla- 
tion; because of this, it is very probable that 
the first two identified factors (Factors no. 1 
and 3) would themselves adequately predict 
the criterion scale values, although the pre- 
dictive value of these two by themselves was 
not determined in the study. 

4. The results of the investigation were of 
such a nature as to suggest that a job evalua- 
tion system structured on the basis of these 
results could be expected to be of general ap- 
plicability to the entire population of enlisted 


naval jobs. 


Received August 10, 1953. 


References 


1. Bittner, R. H. and Rundquist, E. A. The rank- 
comparison rating method. J. appl. Psychol., 


1950, 34, 171-177. 

. Garrett, H. E. Statistics in psychology and edu- 
cation (Third Edition). New York: Long- 
mans, Green and Co., 1947. 

. Guilford, J. P- Psychometric methods. New 
York: McGraw-Hill Book Co., Inc., 1936. 

Manual of Enlisted Navy Job Classifications 
> EOS ANPERS 15105, Revised). Bureau of Naval 
Personnel, Navy Department, July, 1949. 


N 


w 


> 


THE JOURNAL OF APPLIED Psycnorocy 
Vol. 38, No. 4, 1954 


Comparability of Personal Attitude Scale Administration With 


Mail Administration With 


and Without Incentive ' 


Paul W. Maloney 


The Addison Lewis Company, 


Industrial psychologists have two obliga- 
tions. First, the techniques we use must 
yield valid results. Second, we must be sure 
that the use of these techniques is economi- 
cally feasible. 

The attitude scale exemplifies an effective 
device whose utilization has been restricted 
by cost considerations, Where group ad- 
ministration is possible attitude scales are 
being used extensively. But many “groups” 
in the social sense are not grouped geographi- 
cally. In this latter case the attitude scale 
is generally administered by personal inter- 
view. And that’s where the cost per subject 
zooms upward. 

There's no doubt 
tration would be 
But we have been 
ministering attitud; 
questioned the vy; 
trolled technique, 

This study was plan 
We wanted to determin 
three methods of atti 
one by a 
by mail. 


ned to test validity. 
e the comparability of 
tude measurements — 
personal interview, the other two 


Method 

A total of 127 j 
terviewed. A 19-item Likert- 
scale formed part i 


The 
sample was fixed 


Interviewers were 
when the second 
No records were 
or the number of 


A slightly abridged form of the question- 
naire used by the 
148 subjects. 
abridged.) Hal 


interviewers Was mailed to 
(The attitude scale was not 
f received a 25¢ piece as in- 

1This study was Part of a total communications 
analysis of a public utility. The subjects were all 
customers of the public utility. The scale concerned 


customer attitudes toward service, public ownership 
and company personnel, 


238 


Minneapolis, Minn. 


centive; this Sroup was sent one blanket E 
low-up letter. The other 74 subjects wel 
just asked for cooperation; here there W 
two follow-ups to non-respondents. 


Results 


The questionnaire without the quarter m 
ceived a 58% return. The quarter broug 
back 86% of the questionnaires. rson- 

The same attitude scale had been pe the 
ally administered to other groups. o dis- 
basis of 175 schedules (including the os ating 
cussed above), the eight most discrimina av- 
items were selected. Table 1 shows yas 
erage scale values which the three g" 
made on these eight items. st 
The two mail techniques produced ama r 
identical average values. Attitudes ene 
noted by the personal interviews, how 
were considerably lower, vo sets 

As a further test of comparability, tW A set 
of correlations were computed. The Da 
dealt with the incidence of the media 
“Undecided” response. The per cent 
quency of this response was compute s 0 
each of the 19 items, for all three gow d. 
subjects. These percentages were corre 
The results comprise Table 2. cially 

All relationships were strong. Espè Ap" 
comparable are the two mail surveys: 
Parently incidence of “Undecided” does 


E ` of 
vaty item-by-item among the methods 
ministration, 


fre- 


ad- 


Table 1 


averae 
Sca 

Value 

Administration 2.32 

Personal Interview (N = 127) 238 

Mail, with Quarter (N= 64) 2 ae 


Mail, without Quarter 


(N= o 


SS 


>t 


Personal Attitude Scale and Mail Administration 


Table 2 


Correlations of the Incidence of the “Undecided” 
Response in Three Administrations 


Pearsonian 

Comparison Correlation 
Fiscal Interview with Mail-Quarter 84 
uaa Interview with Mail-Non-Quarter 83 
ail-Quarter with Mail-Non-Quarter 95 


ee second set of correlations was also de- 
the Te test comparability. For each item, 
Site ree (out of five) least favorable re- 
fae were grouped as non-favorable. The 
Were ntages of these non-favorable responses 
acy calculated for each item, for all three 
Boe a The correlations of these percent- 
are shown in Table 3. 
Pc again the correlations were high. The 
Jot mail techniques were again most similar. 
of e that the “Undecided” response was one 
the three non-favorable. This means that 
a Correlations were to some extent a corol- 
Y of the relationships shown in Table 2. 


Table 3 


Correlati nee 
Orrelation of the Incidence of Non-F 'avorable 
Responses in the Three Administrations 


Pearsonian 
Correlation 


Bes Comparis 
rsong — 
ome Interview with Mail-Quarter „84 
Sonal Interview with Mail-Non-Quarter 81 
92 


Mail- Z 
Quarter with Mail-Non-Quarter 


239 


Conclusions 


The healthy return indicates a possible 
economy in mail administration of attitude 
scales. The per cent of mailed questionnaires 
returned was influenced by a financial incen- 
tive; results were not. Mailed administra- 
tions denoted higher attitudes than the per- 
sonal interview. Though this difference in 
scale values was pronounced, item-by-item 
ups and downs were the same with all three 
types of administration. 

Of course, all of these conclusions must 
be interpreted carefully. Individual attitude 
scale administrations are specific unto them- 
selves. But if the findings from other studies 
are similar, we may be able to consider the 
d attitude scale a good tool. The higher 
scale values are puzzling. But the item-by- 
item similarity is encouraging. We may find 
that the technique can be used, providing the 
denoted attitudes are depressed to some ex- 
What that depression should be we 
say at the present time. 


maile 


tent. 
cannot, of course, 


Summary 
Residential customers of a public utility 


were administered an attitude scale. Three 
methods of administration were used: per- 
sonal interview; mail with financial incentive; 
and mail without financial incentive. The 
responses obtained by each method were com- 
pared. The three were found to be reason- 
ably comparable. 


Received September 3, 1953. 


THE JOURNAL or APPLIED Psycuorocy 
Vol. D No. 4, 1954 


An Empirical Analysis of the Effectiveness of Psychological 
Warfare * 


Thomas G. Andrews 


University of Maryland 


Denzel D. Smith 
Office of Naval Research 


and 


Lessing A. Kahn 
Operations Research Office, The Johns Hopkins University 


Of the many reports on Psychological War- 
fare (PW) relatively few have been directed 
toward analyzing hypotheses about the funda- 
mental nature of PW and the ways in which 
it acts upon the individual recipient. There 
are, of course, many reasons for this situa- 
tion, Unfortunately, this is one of the many 
cases in which we have had to employ a sys- 
tem without full information of the system 
and how it works, The authors were engaged 
by the Operations Research Office to design 
and carry out a research evaluation of several 


aspects of PW in Korea, especially to deter- 
mine certain of th 


tudes, Motives, 


s role, is well taken ¢ 


are of 
no state of fear, and 


physically, is in is in 


240 


complete accord with the war aims and i 
ology of the forces of his nation, will no ihe 
sensitized to the content of PW. On ite 
other hand, the person who is at the ae 
Poles of these characteristics may be so ™ fe 
to surrender or to show defection ee 
or whatever our PW js designed to produ v 
that he does not need the propaganda. udg- 
was thus thought of as having mainly & B 
ing or precipitating effect on behavior a S 
what secondary to the preparatory effec 


: ts ° 
the more physical and material aspec 
warfare, 


Criteria and Factors Studied 


i ut 

Because the research was to be carried and 
in Korea and by interrogation of Chine are 
North Korean Prisoners of War, the C! con: 
used were restricted by these available 5 d 
ditions. As criteria the research ag 
the degree of willingness to surrender p 
fully as contrasted with having require 


ful capture and the degree of disa 
shown, 


ace 
ce" 


ffectio” 


ral 
Attempts were made to identify se at 
factors whi 
individual’s position along the gene ig ee 
tinuum of receptiveness to PW, and fa 
clude estimates of behavior that wel? ine 
Pothesized as important conditioners ° fac 
criterion behavior. The following as 
tors were chosen for the investigation: 
A. Degree to which the individual, B° 
the war, was i 


:deolo 
S in accord with the ideo 
War aims of the Pı 


idual had experienced ™ 
fear during battle, 


P re) 
ch would serve as estimates % ; 


onere to which the individual felt he 
fori at poorly treated and physically cared 
D Bee own forces during the war. 
tle ae amount and intensity of direct bat- 
E ener the individual had. 
inada ne Lew amount of U. N. Forces propa- 
during me ey kind received by the individual 
f es total amount of U. N. Forces propa- 
uring pa medium received by the individual 
and (3) hy (1) leaflets; (2) loudspeaker ; 
reece proximity of the propaganda re- 
ae action in front line battle. 
OE S of defection or change in the in- 
of his eg with the aims and operations 
ae ilitary forces. (criterion) 
ing to y io which the individual was will- 
opposed 7 sought to, surrender peacefully as 
faken wi o forceful capture at the time he was 
prisoner. (criterion) 


tive es were designed to measure the rela- 
| factors, i of a person on each of the nine 

Used i ca combination of techniques was 
lons TFR such positions. Several ques- 
Written pose of alternative responses were 
"esponses n each factor, and the alternative 
reflect a ea designed in such a way as to 
Xperience vel of intensity or amount of the 
amples f or attitude being assessed. Ex- 

of such items are given below: 


a EEE 


—=_ 


A-16 
k ae would you characterize yourself in 
erms of actions to uphold the principles 
of the Peoples Government? 


— (1) Tried to be critical and show 
others the faults of commu- 
nism, 
Was neutral and took no ac- 
tion one way Or another. 
Was active in furthering the 
principles of the Peoples Gov- 
ernment. x , 
Believed so firmly was willing 
to fight for these principles. 


d, to what extent 


ons such as food, 
edical care 


= 


— (3) 


=a 
C3 
4, 
a the war progresse 
5 your living conditi: 
clothing, comfort, and m 
change? 


| — (1) Was always able to get 2 
i 


a 
n C -a N  . 


long 


fairly well. 
— (2) Things were bad but never 
N unbearable. 
(3) At times things W 
pee unbearable. 
(4) Conditions be 
i unbearable. 


ere nearly 


came completely 


Analysis of the Effectiveness of Psychological Warfare 


241 


The questions on each factor were grouped 
together. For the end of each of these sec- 
tions a 7-point rating scale was designed with 
descriptive anchors for each point regarding 
relative position on the scale for the particu- 
lar factor involved. In addition to these more 
formalized approaches an open-end interview 
system was devised for each of the nine fac- 


tors. 
Procedure 


The question forms were translated into 
Korean and into Chinese and printed in those 
languages. Through the cooperation of the 
Army officials in Japan and Korea the au- 
thors went to Korea, where a group of native 
Korean college graduates was selected and 
trained to serve as interviewers, interpreters, 
and translators. These men were trained in 
the procedures of the standardized interview. 
Military arrangements were made to allow 
the authors and the native members of the re- 
search team into the Prisoner-of-War camps 
in Pusan, and to furnish groups of POWs se- 
lected according to several criteria.’ 

In the interview sessions rapport was es- 
tablished with relative ease, and certain con- 
ditions were arranged to assess the veracity of 
the reports. The interviewer in each case, 
after explaining the general procedure, read 
each question ‘and the alternate responses, 
and checked the response selected by the 
prisoner. Any seemingly important discussion 
that was raised about an item was also re- 
corded by the interviewer. At the end of 
each section of the interview, the rating scale 
was described in detail and the prisoner indi- 


cated his judged position on it. After each 
rating scale was used, the interviewer dis- 
cussed the prisoner’s experiences in 4 general 

urther comments and de- 


zay to probe for f 
Tr oe ing to that particular factor. 


scriptions relating 

Full notes were these open-end 

arts of the interview, and the interviewer 
ing of the prisoner based on: 


made his own rat A $ 
his comments and discussion. This process 


ure of the criteria of selection can- 


2 The exact nat 
A the authors describe 


ot be specified here, nor can the au 
ie Prisoners other than by indicating they were 
made up of several hundred Chinese and North 
Koreans captured oF surrendering during military 


operations. 


242 


was continued through the schedule of nine 
factors. The papers were then translated 
back into English and brought back to the 
United States for analysis. 

A single numerical index was desired for 
each prisoner on each of the nine factors. A 
group of research assistants worked on the 
forms to obtain a single rating on a new nine- 
point scale for each factor for each of the 
cases. These new ratings were based on 
judgments considering all the information re- 
corded. Each form was 


ting value 
exceeded one scale point, the raters held dis- 


cy. Com- 


Results 


The resulting data Were processed by IBM 


Thomas G. Andrews, Denzel D. Smith, 


and Lessing A. Kahn 


the finding of any correlation above .30 was 
satisfying. sdi 
The coefficients in rows H and I indi 
the factors that relate to defection beh 
and willingness to surrender respectivey 
Those factors that correlate with one of ihe 
two criteria also correlate in the same on 
tion and general magnitude with the a nly 
and the two criterion scales correlate nis cy 
with one another. This expected consistera 
of results for the two criterion scales liy 
to indicate a core of reliability and credibi 
of the results for these two scales. had 
Three of the factors for which scales ti- 
been constructed and which had been ane 
mated to have some influence on ja 
and/or willingness to surrender did ae 
any significant relation to the two wet 
These particular scales were constructe re- 
estimates of fear (B), amount of PY 
ceived by radio (F-3), and the relative p 
imity of the PW received to operations in- 
front line battle (G). The items pi for 
tensity of fear probably did not wor with 
Orientals: they did not correlate well r 
any other measures. The prisoners reper 
that they did not have radios available, * 5 
So scale F-3 could not be expected to ia 
sults, 


; les A 
The morale factors contained in sca; 


rox” 


S were ol ; 
that were suspected ak poe and C correlate higher with the cane 
atl 2 = Š 
large errors of mea vey and I, than do the Psychological Warfat al 
asurement such as the A Jec 
se, tors E and F1, This result was exf 
OBiained Table 1 
Dla r i > P 
uned Correlations Among Specified Attitudes and Experience fN Corean and 
nen Inese Prisoners of War* 7 CAN oea DRA 
A B c D - mes — a I 
War aims A = E F1 pa T3 G 
Fear B .02 ax 
Bad treatment C —.59 15 
Battle exp. D -16 OF = 08 
PW rec'd E —.18 02 4 an 
» Leaflets F-1 —.15 06 2 an 
Loudsp’ker F-2 —.14  ~99 31 -89 — 
Radio F-3 —08 ~ 49 03 HG 52 ss 
PW proximity G 04 13 04 = 09 10 as 
Defection H —.58 03 152 a 28 Si ee es a 
Surrender I -—59 05 AG 20 30 i 10 En 71 
es, er, vee OB Se ae 
* Correlations higher than .100 are significant at the 1% level on a twouad T 
“ rWo-tail test, 


| 


Analysis of the Effectiveness of Psychological Warfare 


oe Poy the case in any large-scale 
extent to eee The concern here is the 
fection a ene accuracy of prediction of de- 
ceived an surrender from amount of PW re- 
tors ee contaminated with the other fac- 
PW bear T morale, that one must say that 
fluence po significant or demonstrable in- 
problem R — were made to analyze this 
correlation. neans of partial and multiple 
Re Sr correlation between PW (factor E) 
War dims ree (H), partialling out accord with 
to .26 wh (A), changes the correlation of .31 
Mates of see is still significant. When esti- 
Partialled aa teeafoent (factor C) are also 
relation ae second order partial cor- 
duces onl W and defection (Œ and H) re- 
significant, to .22, which is still statistically 
cal Warfa, It would seem that Psychologi- 
ence on am does offer some effective influ- 
its conjoi he Oriental troops, independently of 
net ane action with lowered morale. This 
ð kn sor does not, however, appear 
Surrende or predicting relative willingness to 
tors are y peacefully. When the morale fac- 
of PW partialled out, the predictable effects 
Second-ord surrender behavior reduces to a 
Within er partial correlation of only 09. 
Te sever: bi correlation table obtained there 
ive to al values that stand out as provoca- 
tionshi consider. Only certain of the rela- 
Mates ne are summarized here. Because esti- 
are bei chronologically antecedent behavior 
Usua le dealt with here, it is more than 
e a eng to attribute causality to 
Shown ; ts. Analysis of the interrelationships 
fection > Table 1 appears to indicate that de- 
at ea surrender are behavior patterns 
Cops on expected in the more seasoned 
ong of high morale, but are predictable 
of td ia troops of lower morale. This, 
Ciple ee is practically an established prm- 
l E on rationalization and experience 
Ut he are. However, the fact as brought 
Ability re serves to demonstrate some reli- 
Fat of the data obtained and to corrobo- 
9 € view that morale is a primary target 
Orieng Pological Warfare. At least with 
als, merely reiterating the desirability 

Urrende y reiterating the desira 
ion to r and giving suggestions about de- 
enemy troops is not enough. It 1s 


tr 


243 


also important to note in Table 1 that the 
morale factors A and C correlate significantl 
with the total amount of PW received (E). 
This result may mean either that the PW yas 
destructive of morale or that lowered morale 
sensitized the troops to the PW. 

It was desired to determine whether there 
are any influences of PW that are independ- 
ent of the morale determiners of defection 
and surrender behavior. Through second- 
order partial correlations one set of results 
was obtained, as previously described. Fur- 
ther analysis in terms of multiple regression 
was used to throw light on this problem. 

The multiple correlation of factors A C 
D and E with the criterion H, defection i 
.66, and the regression equation in Betazform 
for this criterion is presented below. The 
Beta-coefficients serve to indicate relative 
contribution of the factors mentioned: 


H’ = — .37A' + .23C' — .14D' + .24E’ 
T = — A2A’ + .15C' — 25D’ +.19E’ 


With the criterion I, willingness to sur- 
render, the multiple correlation with A, C, D 
and E is .65, and this regression equation in 
Beta-form is shown above. Comparison of 
these Beta-coefficients again indicates some 
of the differential influence of these particu- 
lar “determining” factors. 

When the measure of defection is added to 
the regression equation predicting surrender 
and also adding the factor of number of leaf- 
lets received, the multiple correlation was .76. 
This correlation is for the prediction of sur- 
render behavior from a knowledge of all the 
other important factors. With variables of 
the type used and obtained under the rela- 
tively poor field conditions of measurement 
that necessarily existed in this study, a multi- 
ple correlation of .76 is extremely high. Of 
course, it contains the contribution of one cri- 
terion in predicting the other criterion, in the 
amount of .71. However, the defection atti- 
tudes were presumably antecedent to the sur- 
render or capture of the troops becoming pris- 
oner. In so far as each of the variables other 
than the scale for surrender is a measure of 
some behavior occurring prior to the final 
actual surrender Or forceful capture of the 
prisoners, this correlation of .76 would appear 


244 


to indicate that peaceful surrender may be to 
a great extent dependent on and predictable 
from the particular forms of attitudes and ex- 
periences measured by the scales devised for 
this study. It is understood that this study 
requires cross-validation and also replication 
with other groups. 


Summary 


Standardized interviews on North Korean 
and Chinese prisoners of war were carried out 
to test the relative importance of several atti- 
tudes and experiences in determining the de- 
fection attitudes on the part of the captive 
troops and their willingness to surrender 
peacefully at the time they were taken pris- 
oner. Among the experiences assessed was 
the amount of tactical psychological warfare 
the troops had received from the United 
Nations before becoming Prisoners of war. 


Thomas G. Andrews, Denzel D. Smith, and Lessing A. Kahn 


“Scores” on each factor and experience were 
derived as well as on the two criteria of de- 
fection and willingness to surrender. 

The primary results are presented in a cot 
relation matrix, which is analyzed for certam 
relations and with respect to the general hy- 
pothesis that psychological warfare is effec- 
tive in changing behavior, but its effects ra 
mainly of a precipitating nature that is dii 
ferential for persons more sensitized to it bY 
their morale and experiences. al 

The primary correlations, certain partia 
correlations, multiple correlations, and stan® 
ard multiple regression coefficients were ana- 
lyzed and appeared to corroborate the maj" 
hypothesis. Additional relationships of pei 
sible military and social importance arè z 
ducible from the data obtained. 


Received September 10, 1953, 


HE JOURNAL of AP S Y 
JRNAL oF Arrien Psye 3 
Vol. 38, No. 4, j PSYCHOLOG 


Predicting Achievement in M 
Preclinical and 


edical School: A Comparison of 
Clinical Criteria * 


Robert Glaser and Owen Jacobs 


American Institute fo 


ae gene article in J. appl. Psychol. 
Kene 3 aa reported on the predictive effi- 
ae trial selection test battery ad- 
idena to an entering class of medical 
Ests wen The criterion against which the 
dee ae validated was the general grade av- 
Eha. the end of the first year of medical 
Perform, Criterion data for later medical school 
vide apse have become available and pro- 
earlier opportunity for a follow-up of this 
article. 

ay lidation studies of medical aptitude test 
tion often employ first-year medical 
Use of grades as the criterion variable. The 
terion these grades as an intermediate cri- 
es of medical success is defensible since 
a completion of the first year 19 
and aes for continuance in medical school 
Ee drop-out rate may be higher during 
: eg than other medical school years. 
relati er, the usual question concerning the 
onship between an intermediate criterion 

cal EE ultimate one still remains. A medi- 
into me curriculum can usually be divided 
o ka first two preclinical years and the last 
pce years. The performance of 
is gen ts in the latter two years, presumably, 
s aly more similar to their performance 
ar neg than their performance 1N the 
ing It has been pointed out by 
edicg er (5) of the American Association of 
in al Colleges that “. . . the grades given 
fessional schools may have @ special 
asic ng. In the two preclinical years where 
Calg science courses are usually taken, medi- 
Styne tools have one teacher for each 4 to 5 


d 

teen In the two clinical years, one 

er is used for each 1 to 2 students. 
ime teachers 


me 

(or e dls have more full time 

than po equivalent) in the clinical years 
y ey have students. Many—most—° 

the Rised the meetings of 
a; 


Mid, PON a paper presented at 1 
M yY, Tovestern Psychological Association in Chicago, 


r Research, Pittsburgh, Pa. and University of Pittsburgh 


these clinical teachers are part time and 
many are voluntary, ie., unpaid. Grades 
given under these conditions may have spe- 
cial meaning.” The purpose of the present 
study is twofold: First, to investigate the 
relationship between preclinical and clinical 
grades; and secondly, to compare the validi- 
ties of an aptitude test battery when the cri- 
terion variable consists of preclinical grades 
on the one hand and clinical grades on the 


other. 
This paper reports data obtained from a 


class of 129 medical students at the Indiana 
University School of Medicine. At the be- 
ginning of the first year of medical school, 
150 students were enrolled in this class; at 
the end of the third year 129 students re- 
mained. The two criteria employed were the 
eneral grade averages at the ends of the first 


and third years of medical school. These 


general averages are weighted averages of the 


grades obtained by 4 student in specific 
courses. Weights are assigned according to 
the amount of time a course meets. The spe- 
cific courses in the first and third years are 


listed in Table 1. 


The Relationship between Criteria 


The correlation between the two grade av- 
erages is .54. The mean of the first-yeat av- 
erages is 87.45 the standard deviation 1S 4.1. 


For the third-year averages the mean is 88.7 


and the standard deviation is 2.0. 
The correlations of the specific course 


grades in the first year with the specific 
course grades in the third year are presented 
in Table 1. In Table 1 the third-year courses 
are arranged in order of the size of their mean 
grades. This makes apparent a noticeable 
trend in these correlations. AS the mean 
grade decreases, more and more of the cor- 
relations between the first- and third-year 


courses become statistically significant. Of 


245 


246 


Robert Glaser and Owen Jacobs 


Table 1 


Correlations Between First-Year and Third-Year Grades with Means and 
Standard Deviations of These Grades 


First-Year Courses 


Neuro- Gross S.D. 
Third-Year Courses Physiol. Anat. Histol. Anat. M 5 3 
Psychoneurosis 15 -03 -10 14 96.3 ~ 
G. U. Surgery 22 20 AS 21 95.2 A 
Ophthalmology 19 AL 16 Bi bet 92.6 23 
Dermatology 03 —.02 04 —.10 91.8 60 
Industrial Med. 14 .18 .02 12 89.7 44 
Epidemiology 21 .06 08 05 89.5 42 
Clin. Path. Lab. 31 39 -20 .29 89.2 42 
Pathology .19 -20 As 14 88.7 19 
Clin. Neurol. .10 .06 19 —.04 88.2 36 
Obstetrics 24 21 21 24. 88.0 53 
Anesthesia 13 14 09 09 87.2 68 
Clin. Psych. 28 24 21 12 87.2 22 
Clin. Diagnosis 32 30 39 25 86.7 3 
Cardiology 48 42 44 38 86.5 73 
Pediatric Lect. 28 31 08 —.02 86.2 3 
Surgical Path. 26 35 40 30 85.8 5. 
Medicine Recitation 36 31 37 30 85.6 a 
Anatomy .26 A7 36 36 83.0 * 
M 88.5 88.1 86.6 86.3 
S.D. 5.6 4.8 3.8 4.4 
A rm ace 
the six third-year courses with the highest 1. The Differential Aptitude Tests SPs 
means, none correlates Significantly with any Relations (2), This test consists of D pe 
of the first-year courses. Of the six third- Which require two-dimensional figures e-di- 
year courses with the low translated into the 


J est mean grades, 
four of the six correlate significantly with all 
four of the first-year grades; one of the 


significantly with three of 
the first-year courses; 


third-year courses. 
of this sample it 
the third-year courses, 
grading takes place on a different basis than 
when the mean of the grades is lower, 


Comparison of Validities for the 

Two Criteria 
A trial aptitude battery 
to the medical students at the beginning of 


the first year of medical school. The tests in 
the battery were the following: 


was administered 


ir corresponding thre this 
mensional objects, ‘The rationale pehu the 
test was that an important requirement 3, 
medical student appears to be the ann illus- 
translate his two-dimensional text-boor ig, 
trations into three-dimensional life objects. |; 

2. The United States Armed Forces 
tute Tests of General Educational ferret 
ment, College Level, Test Three: neea Sci- 
tion of Reading Materials in the Natur’ 
ences (6), 

q The Miller Analogies Test (4). Test, 

4. The Army General Classification 
AGCT (1). 


lop” 


the 


d 
n 
The validities for each test for the mo i 
third-year general grade averages are a from 
Table 2. The changes in the coefficients a p 
the first to the third year are not sign?” sgr 
Table 3 Presents the first and thir¢ pti- 
validities for the medical Professional to the 
tude Test (7) which was administered in the 
n this class. The changes i yeat 
S from the first to the thiré 
tatistically significant. 
For the trial 
relation of the 


Coefficient: 


+ je co” 
test battery the multiplo ri 
battery with the first-Y 


ez 


Predicting Achievement in Medical School 


247 


Table 2 


Intercorre' st y S . A 
ations, Va ol ie Means and Standard Deviations for the Tests in the Trial Battery 
» Valid 
y Coefficients, Me: a 
d Standard Di f 1B: y 


n Intercorrelations Validities 
est 
LRG ï a 3 
adi = : a 
2, Site. Cremeans = ae 23 oS : = 
bA = ae 3 28 2 t A 
Pace Relations — 31 .04 .13 ia re 
= 16 13 1 a 
š 62.1 15.4 


terion į , 
The ee with the third-year criterion .39. 
ors in “te e correlation of the best predic- 
the first- 2 Professional Aptitude Test with 
Year criterion criterion is .43, with the third- 
that some i .39. (It should be pointed out 
the Profess, ior selection had taken place on 
ig meai oe Aptitude Test upon entrance 
is paper ni The primary interest in 
» however, is the change in validity 


and 
~~ Dot 

cients, a absolute value of the coeffi- 
gr ication of the overlap between 


OUuDs w 
two Nery selection is based on each of the 
asion Stiai be obtained by using the re- 
o clinica] ite to predict the preclinical 
W relating the es for each student and then 
toe this is — sets of predicted grades. 
: elation ¢ one for the trial battery the 
oot rab sn h is 40. With this de- 
dis of test Panes it is possible that, for this 
tial al or clini are selection based upon pre- 
i erence nie criteria can make substan- 
e reir n the groups selected. 
s comparing preclinical and 


Clini 
no significant 


Cal . 

Q er: i 
ha a5 E grades show 
test validity. The tests predict 


Otrelation M Table 3 
S betw 
and laps PAT Scores and General Averages 
Means and Standard Deviations 
of the PAT Scores 


r 


Ti 
est Se ist 3rd 

rba] Abili E year year M SD 
jenti ity 3 

Son tifi 
na 2s .27 5620 822 
Com anistic is as 3165 753 
Quan er%site .22 .20 514.3 86.8 
Inge tativ 32 27 5345 TAL 
Mog P oe 3g 29 5413 80.5 
ee Sto Ability 33 31 5392 TLI 
ical Sei 30 27 5335 70.5 
menga w 3g 568 67.5 


preclinical and clinical achievement equally 
as well. For the trial test battery, com co 
son of the predicted scores a th 
two different criteria indicates that the grou : 
selected on the basis of each aaa 
be quite different. It can be assumed that 
achievement in the clinical years is a better 
indication of performance as a physician than 
achievement in the preclinical years. If this 
is the case, then a selection test batter: 
should consist of predictor variables which 
concentrate upon predicting achievement in 
the clinical years. Along these lines future 
test development might well be devoted to 
the following: (a) the isolation of behaviors 
which are unique to clinical achievement as 
compared with preclinical achievement; (b) 
the development of reliable measures of ‘these 
criterion behaviors; and (c) the development 
of testing techniques to predict these be- 


haviors. 


Received September 19, 1953. 


References 

1. Army General Classification Test, First Civilian 

Edition. Science Research Associates, 1947. 
2. Bennett, G Ks Seashore, H. G., and Wesman, 
A. G. Differential Aptitude Tests: Space Re- 
lations. The Psychological Corporation, 1947. 
3. Glaser, R. Predicting achievement in Medical 
School. J. appl. Psychol., 1951, 35, 272-274. 
the Miller Analogies 


Miller, W. S. Manual for 
Test. The Psychological Corporation, 1947. 
5. Stalnaker, J. M. Validation of Professional Apti- 
tude Batteries: Tests for Medicine. Prince- 
ton: Educational Testing Service, 1950. 
nited States Armed Forces Institute Tests of 
General Educational Development, College 
Level, Test Three: Interpretation of Reading 
Materials in the Natural Sciences. The Ameri- 
can Council on Education, 1943. 
7. Vaugbn, K.W. The interpretation and use of the 
Professional Aptitude Test. A manual for com- 
mittees On admission in colleges of medicine. 
The Graduate Record Office, January, 1947, 


6. U 


Tue JOURNAL or APPLIED Psycuorocy 


Vol. 38, No. 4, 1954 


Subscore Patterns on ACE Psychological Examination Related 
to Educational and Occupational Differences 


Francis J. Di Vesta 


Syracuse Universit yı 


The present study presents further data on 
a line of investigation opened up by Munroe 
(1). In the original study the hypothesis 


was tested that somethin 


g of the dynamics of 


personality might be revealed in the differ- 


ence between the “Q” and “L” 
derived from the American Co 


scores (Q-L) 
uncil on Edu- 


cation Psychological Examination (ACE). 
The present study is reported here because 


the findings, 


derived through pattern analysis 


as suggested by Munroe, provide implications 


for Personality s 


The rationale 


sented by Munroe was 


Scatter analysis, 


Paport (2) on the Wechsler- 
Her procedure was to admin 


difference betwe 
“L” percentile 


whether the FOY oF MTH 


more V entries 


the higher L group; ( 
entries (responses in whi 


terminant) for t 
nificantly more 


the higher L group. 
groups based on di 
indices might be as 


tudy and test interpretation, 
Background 


behind the hypothesis pre- 

based upon studies of 
Particularly those of Ra- 
Bellevue scale. 
ister ACE and 


en the individuals’ “Q” and 
Standings. On the basis of 


to be: (a) Significantly 
of accurate form) for 
b) significantly more F 
ch form was the de- 
he higher Q group; and sig- 
M (movement) entries for 
Description of the two 
fferences in Rorschach 
follows: The higher Q 


(lack 


group gave responses which show objectivity 


through elaboration 


1 This study Was 


employed by the H 


tute, Maxwell Air F 


by careful observations 


conducted while the author was 


uman Resources Research Insti- 
orce Base, Alabama, 


248 


of objective details, formal intellective ap- 
proach, repressive efforts at control of affect 
and inhibition of normal creative imagination 
and of normal structuring of perception. k 

The higher L group gave responses which 
show subjectivity and imagination through 
cues serving as springboards to new ideas, 
lack of objectivity, creative organization and 
a subjective approach, sometimes to excess- 
It should be noted that Munroe did not 
verify the findings for either group by cross 
validation, 

Roe (3) indicates that the syndrome rep- 
resented by the higher Q group is similar to 
that found in paleontologists as reflected in 
their Rorschach Protocols. The presentation 
of this fact was further supported by citing 
evidence from an unpublished study by Mun- 
roe wherein it was found that in a sample of 
college students the higher Q group tend H 
choose more Scientific and art subjects while 
in college than does the higher L group. 


Statement of the Problem 
The Summary of fing 
ul 


ould seem to indicate that the Q-L constel- 


lation (or Pattern) may be related to differ- 
€nces in the Utilization of 
from differences 


ever, since Munroe’ i 


Stoups at the extremes of a continuum (the 
top and bottom 


quartiles of the distributio” 
of QL nd since they were a unique 
ed desirable to test the origi- 
: evidence from other popula- 
tions, 


š ve 
ings presented abo 


Subscore Pattern on ACE Psychological Examination 


Procedure 


ACE was administered to subjects in an ad- 

vanced military school. The average age of 
the sample was 32 years of age. They aver- 
aged two years of college education, and were 
professionally well established. 
P In calculating the diference between the 
Q” and “L” scores a minor variation of the 
procedure used by Munroe was made. This 
variation was made in determining the rela- 
tive standing of the individuals through the 
use of the standard score rather than in terms 
of the percentile standing. Standard scores 
for the “Q” and “L” scores were computed 
on the basis of the distribution of the groups 
to which the ACE tests were administered. 

All calculations in the study were based on 
the scores of the entire population except in 
individual cases where complete data were 
not available. Where groups were dichoto- 
mized on the Q-L variable all cases which 
had a difference greater than zero, regardless 
of how small the difference, were placed in 
either the high Q or high L categories. 


Results 


The first step in the analysis was to deter- 
ne the relationship between the ACE “Q,” 
L,” total and Q-L scores for the population 
Studied. These relationships are shown in 
Table 1, 

The major hypothesis was that the Q-L 
Pattern is related to the kinds of occupational 
and curricular choices made by individuals. 
A sub-hypothesis was formulated that pilots 
Would have different Q-L patterns than non- 
Pilots. The rationale was that whether an 
Individual becomes a pilot is initially a mat- 
tër of choice although the non-pilot popula- 
tion may not be as homogeneous with respect 


Table 1 
Tntercorrelations of ACE Scores 
ACE Score 
ACE 
Score “p” Total Qe 
o 57 St 50 
ei” : o4 40 
Total —.08 


249 


Table 2 


Q-L Scores Obtained by Flying and Ground Personnel 


Aero-Rating 


Flying Flying 
Pilots Non-Pilots Ground 
Q-L Per Per Per 
Score N N Cent N Cent N Cent 
HigherQ 220 154 57 16 83 50 36 
Higher L 226 116 83 a y 89 64 


Total 446 270 100 37 100 139 100 


x? = 16.86; N = 2; $ = .001. 


to choice. Thus, the pilot population may be 
considered to represent a group of individu- 
als who expect to utilize their intelligence and 
skills in a specified manner. Accordingly, be- 
cause the demands of the pilots job are 
highly technical and require a high degree of 
objectivity it was expected that more pilots 
would be in the Aigher Q group than non- 
pilots. Because the non-pilots are in adminis- 
trative positions it was hypothesized that they 
would tend to be in the higher L group. 

The number of pilots, flying officers other 
than pilots (navigators, bombardiers and ob- 
servers) and ground officers in the higher Q 
and higher L groups are shown in Table 2. 
The chi-square for this distribution is signifi- 
cant (p <.01). The greatest contribution to 
chi-square was found in the difference be- 
tween the numbers of the non-flying indi- 
vidual in the higher Q and higher L groups, 
although each of these classifications con- 
tributed to the total chi-square. 

Since data on another population of about 
400 more personnel were available, it was de- 
cided to duplicate the first analysis as a 
check. Trends in this second population were 
as distinct as the ones shown in Table 2. The 
chi-square was lower but was significant at 
the .02 level of probability. The consistency 
in the two populations is attributable to the 
fact that the major contribution to chi-square 
is made by the differences between the num- 
ber of non-flying personnel in the higher Q 
and higher L categories. F ratios for the 
ACE “Q,” “L” and total scores were not sig- 


nificant for these populations. 


250 Francis J. 

A second sub-hypothesis was made that 
there would be a difference between the num- 
ber of reserve officers and regular officers in 
the higher Q and higher L categories. The 
rationale was that the regular officers rep- 
resented a homogeneous group of individu- 
als who selected the Air Force as a career, 
whereas, the reserve officers represented a 
more heterogeneous group of individuals with 
respect to making a career in the Air Force. 
In the first sample there was a tendency for 
more regular officers to be located in the 
higher Q category than in the higher L cate- 
gory and more reserve officers to be located 
in the higher L than in the higher Q cate- 
gory. The chi-square for differences in these 
distributions was not significant (p < .20 pi 
-10). The same tendency existed in the sec- 
ond population but again the difference in the 
distributions was not significant pgss 
.20) by the chi-square test, These findings 
indicate that the Q-L constellation does not 
differentiate between these two groups. 

A third sub-hypothesis was that there 
would be differences between groups of in- 
dividuals in different areas of greatest job ex- 
perience. This hypothesis, as in the cases of 
the other sub-hypotheses, 
cause the area of job exper 
Tepresent the kind of occup 


involving plan- 
be filled by in- 
and job areas 
technical rou- 
tine requirements would be filled by individu- 
als with higher Q scores. 

The means and Standard deviations of the 
Q-L scores for individuals in each job area 
are shown in Table 3, The F ratio for the 
data in this table is 2.07 (p < .05 > .01). 
The data clearly indicate that the Q-L scores 
differentiate between individuals in the main- 
tenance-inspection type of function and the 
individuals in the intelligence-comptroller 
type of function, Those in the former func- 
tions tend to have higher Q scores and those 
in the latter functions tend to have higher L 
Scores. Means for the second population were 
calculated, and the F test was again applied. 


Di Vesta 


The F ratio for the variances between groups 
in the second population was found to be sig- 
nificant. In addition, the F ratio for vari- 
ances between the two populations was also 
found to be significant. Differences between 
the populations were found to be attributable 
to the Q-L scores for individuals in the 
maintenance and operations areas. The mean 
for the maintenance subjects in the new popu- 
lation was 49.1 1.6 and the mean for the 
operations subjects was 47.9 + 8. 

The original “Q” and total scores of the 
ACE failed to discriminate between subjects 
in the various job areas, The F ratio for 
each of these sets of scores was significant at 
the > .05 level of Probability. The F of 2.1, 
however, Was significant (p < .05) for the 
Original “L” scores of subjects in these job 
areas. In decreasing order from high “L 
scores to low “L” Scores, as shown in Table 
4, were these job areas: intelligence, research, 
communications, comptroller, supply, person- 
nel, administration, operations, inspection and 
maintenance. There is some tendency for the 
ACE “L” score to discriminate between the 
job areas for this Population in the same or 
der that the ACE Q-L score does. There is: 
however, enough difference to indicate that 
Something different ig being measured by the 


-L pattern from that being measured by 
the “L” score. 


Table 3 


Mean Q-L Scores of Individuals in Ten 
Ccupational Areas 


Š Se 

Standard 

Occupational Error of 

Area N Mean Mean 

Maintenance 37 53.9 1.2 
Inspection 9 $1.2 2.0 
Communications 8 51.0 28 
Research 8 50.9 3.2 
Operations 138 50.3 0.7 
Personnel 35 49.1 1:4 
Administration 67 48.7 11 
Supply 12 48.5 2.6 
Intelligence 24 45.0 Le 
Comptroller 9 441 2.6 
All other 44 49.4 i 
Unknown 5 45.6 is 


F= 21; B= < 05'S 07, 


Subscore Pattern on ACE Psychological Examination 251 
Table 4 
ACE Total and Subscore Means of Personnel in Ten Occupational Areas 
«py “ 

— L” Score Q” Score Total Score 
Area N M S.E. M S.E. M S.E. 

T Enee 37 67.4 3 40.8 15 108.2 335 
ereenn 9 68.8 2 39.2 21 108.0 69 
po umunleabions 8 77.7 5.7 43.5 3.0 121.3 7.9 
pai 8 80.8 4.4 45.0 2.3 125.8 5.0 
om g 138 7. 1.5 41.5 0.8 116.6 2.1 
FENE 35 75 2.7 40.7 15 116.3 3.8 
e tunistratioi 67 I 21 40.1 1.3 115.6 3.1 
pad 12 7 34 40.6 2.2 1173 45 
eee 24 84.8 3.0 41.6 1.9 126.4 4.4 
n 9 77.2 51 36.7 3.2 113.9 82 
6: t 44 78.2 24 429 14 1201 du 
nknown 5 86.4 4.6 42.0 3.9 128.4 8.3 

` = 2.1 0.6 1.4 
p= <.05>.01 >.05 >.05 


= Sa hypothesis was that the Q-L pat- 
wha qen discriminate between individuals 
college ose different fields of specialization in 
it too r This variable was selected because 
Was a ee a situation in which choice 
ege ree Fields of specialization in col- 
Purpose e grouped into five classifications for 
repres ne analysis. The science group was 
dlo ed by such college majors as psy- 
the tet geology, mathematics and zoology; 
tory ‘S group by majors in liberal arts, his- 
nical music and English literature; the tech- 
(exch by majors in the applied areas 
aws ing engineering) such as education, 
Soon casework and agriculture. The en- 
fone ing and business administration groups 
el Composed of majors in these respective 

S of specialization. 
grou © percentage of subjects in the higher Q 
P and in the higher L group for each of 
five types of college majors is shown in 
highe, The differences in proportions of 
of ae and higher L subjects in these classes 
> o ege majors is significant (p < -02 and 
ings i by the chi-square test. These find- 
6 5 not support those of Munroe and Roe 
ectin ound in their studies that students se- 
8 science subjects tended to have higher 

an higher L scores. 
e F ratios between fields of specializa- 


tion for the “Q,” “L,” and total ACE scores 
were significant (p < .01). These are shown 
in Table 6. 

Students who had specialized in science, 
arts, and engineering achieved higher scores 
on the ACE “L” score than did those who 
majored in technical and business adminis- 
tration courses. Students who majored in 
technical courses achieved lower scores on 
the ACE “Q” scores than did any of the 
other groups. Majors in engineering, arts 
and science achieved higher total scores than 
did those with business administration and 
technical fields of specialization. 


Table 5 


Proportions of Students in Each Field of Specialization 
with Higher Q and Higher L Scores 


Per Centin Per Cent in 


Higher Q Higher L 

College Major N Group Group 
Arts 83 43 57 
Sciences 100 48 52 
Technical 74 54 46 
Engineer 94 37 43 
Business 

Administration 66 70 30 

Total 418 


= 124; N= 4; p= <.02>.01. 


Francis J. 


Tabl 


Di Vesta 


eó 


ACE Total and Subscore Means for Personnel with Different College Majors 


“L” Score “Q” Score Total Score 

S.E. 

College Major N M S.E. M S.E. M — 

Science 100 78.8 1.6 45.2 0.9 124.1 Fi 

Arts 83 80.1 1.8 46.2 1.0 125.8 Ba 

Technical 74 73.0 19 “na 10 io Pe 

: 7 7 78 126.0 25 

Engineer 94 78.3 L7 47.8 1.0 4 26 

Business Administration 66 72.0 1.8 45.4 1.0 117.4 
F Ratio 3.96 3.75 3.71 
p <.01 <.01 <.01 
É . nd 
Summary “QP” SER van total scores of flying 4 


The present study was conducted to ex- 
amine the findings of Munroe that the high 
Q and high L patterns (derived from ACE 
sub-scores) reflect different personality syn- 
dromes. To demonstrate whether the origi- 
nal findings would apply in a different re- 
search situation the Q-L scores were studied 
in relationship to occupational and educa- 
tional differences, 

A sample of Air Force officers was used in 
the present study. The criteri 


gnment to the regular 
or reserve Corps; the officer’s Career field; 
and the officer’s college major. 
were that: pilots tended to have higher Q 
Scores, non-pilots had higher L 


arts and sciences had higher L 
scores whereas individuals 


areas, engineering and busin 
tion had higher Q scores, 
found in the Q-L patterns 
regular officers, 
Although these data in 
between the Q-L patte 


ess administra- 
No difference was 
of reservists and 


he original ACE scores 
be consistently related to 
No differences were found bi 


were not found to 
these situations, 
etween the ACE 


ground personnel. The ACE “Q” and C 
scores did not discriminate between P 
als in different job areas although the a 
score did discriminate in about the oe 
der as the Q-L pattern. Each of the nae 
ACE scores was related to the college majo" 
but not in the same manner as was the Q- 
Pattern Score, ‘that 
Tt would appear from this evidence ot 
there is a relationship between the ACE Ae 
pattern and the utilization of intelligence A 
As further studies are er 
or another form of pe 
is reasonable to expect i 
led through test score P or 
gh independent scores A 
Certainly the ae 
of a clinical “PProach to the understanding i 
i mics underlying pattern poe 
t be considered a useful i r 
elopment of hypotheses 10 
gations, 
Received August 10, 1953, 


References 


e 
1. Munroe, Ruth L, Rorschach findings on a 
students having different constellations © 


40, 
Scores on the AC J. consult. Psychol, 19 
10, 301-316 


e Rapaport, D. `D 


re 3 z testing: 
tagnostic psychological 
Vol, 1 Mennin 


ss NO- 
ger Clinic Monograph a 
ear Book Publishers, ie 
orschach study of a gro 


hols 
echnicians, J. consult. Ps¥¢ 
0, 317-327, 


3 New 
4. Super, D, E, Ab praising vocational fitness- 
York: Harper and Brothers, 1949. 


í 


il 


| 


i 
l 


: 


THE JOURNAL oF A) 
j PPLIE vi $ 
Vol. 38, Novas D PsycHoLocy 


The Effect of Methods of Pres 


entation and Examining Condi- 


tions on Student Achievement in a Correspondence Course 


Francis J 


. Di Vesta 


Syracuse University * 


Correspondence courses form an impor- 
tant communication medium for education in 
both military and civilian instructional areas. 

hese courses may take the form of highly 
Commercialized businesses independent of an 
educational institution, home study courses 
Conducted by schools, or extension courses 
Supported and administered by military and 
oe governmental agencies. The number of 
hie enrolled in such courses probably 

Umbers 100,000 or more per year. Despite 
ie importance of this particular segment of 
t educational system a review of the litera- 

, ture reveals a most obvious scarcity of studies 
applying directly to effective methods for the 
Presentation of these courses. Accordingly, 
mn the question concerning the most effec- 
ee method of presenting correspondence 
a occurred in a military extension 
al Re institute a ready answer was not avail- 

€ in the literature. Consequently, it was 
ecided to conduct an experiment in which 
Used nr spondence course students would be 
as the experimental population. The 
present report is a summary of the findings 


t : 
Om this experiment. 


g 


The Problem 


Ta Problem in this study was to deter- 
ing the most adequate methods of present- 
achi Course materials for effective student 
€vement. Two major hypotheses were 

f Specific foci of the experiment. The first 
po that three styles of presenting corre- 
i ndence course materials would result in 
erential student achievement. The second 
poset esis was that quality control, as im- 
by examining conditions, would affect 


This study was conducted in the Officer Educa- 
e vision of the Human Resources Research In- 
» Maxwell Air Force Base, Alabama. Dr. 
A cDonald, Extension Course Institute of the 
Tce, Provided useful guidance and assistance 
esign and administration of the experiment. 


the achievement of the students amd their re- 
tention of level of achievement. 


Procedure 


The study was conducted with applicants 
enrolling for a physical training course. This 
course was at the Officer Candidate School 
level of difficulty and was devoted to an un- 
derstanding of the development of physical 
education programs for combat fitness. 


The manual or text in the course was prepared 
in three different “styles.” Style A was a manual 
written in a popular and personal manner with 
several illustrations of the cartoon variety. This 
manual was commonly referred to as the Popular 
Science style for descriptive purposes. Style B 
was written in the formal expository manner 
commonly used in textbooks. Detailed illustra- 
tions were used in the text manual. Style C was 
actually a study guide divided into several “les- 
sons” or units, Each unit had its major objec- 
tive(s), references and questions. An Air Force 
field manual was provided with the study guide 
for reference purposes. This style was known 
as the “Chicago” style because it was generally 
fashioned after the syllabi used in the University 
of Chicago home study courses. 

The research design is described briefly below. 

1. All enrollees were administered, through the 
mails and before receiving course materials, a 
pre-test of fifty items. 

2. Each enrollee was then assigned to one of 
the following experimental groups: 


Kind of Examina- Style of Material 


tion 
Open book examina- Style A StyleB Style C 
tion 
Closed book exami- StyleA Style B Style C 
nation 


Assignment to these groups was made on the 
basis of pre-test scores so that equal numbers of 
students in each quartile would appear in each 
of the experimental groups. j 

3. Enrollees in the groups taking the open 
book examination received the “final” examina- 
tion at the time they received the course mate- 
rials to complete in any manner they desired. 

Enrollees in the groups taking the closed book 
examination named a proctor. When the indi- 


253 


254 


vidual felt he was ready to take the examina- 
tion, he reported to the proctor who, in turn, 
administered the examination. The completed 
examination was returned by the proctor, with 
certification, to the administration center. 

Course materials were returned by both groups 
at the time the examinations were returned to 
the administration center. 

4. Thirty days after taking the final examina- 
tion, students took the same examination a sec- 
ond time. This test was known as the “reten- 
tion” examination. Enrollees in both open and 
closed book examination groups took the reten- 
tion examination without course materials. The 
“open book” group took the retention examina- 
tion without a proctor and the “closed book” 
group under the same proctorship as in the origi- 
nal administration, 

5. When the retention examination was re- 
ceived by the administrative office the course 
materials were returned to the student for his 
files. 

6. Enrollees were notified at the start of the 
study that they were to participate in a research, 


a ed through regular 
_Mailing procedures. A total of 


Results 


The means for each of the 
pre-test * are shown in Table 
enrollees were assigned in equa 
each of the experimental grou 


groups on the 
1. Although 


IPS, More stu- 


Table 1 
Pre-Test Scores for Each of the Experimental Groups 
Closed Book Exam Open Book Exam 
Style* N M sp, N M SD. 
A 49 324 39 68 323 47 
B 36 329 35 3 A7 44 
C st 39 37 66 313 2i 


F (open vs. closed book) = 2.2; p = >.05. 
tween styles) = 1.6; p = >.05. 
* See text for description. 


F (be- 


1 The pre-test was an examination used for previ- 
ous classes of enrollees. A new examination was 
used for the “final” examination in the experiment. 


Francis J. 


Di Vesta 


Table 2 


i S 
Final Examination Scores for the Experimental Group 


Closed Book Exam Open Book Exam 

Style* N M SD. N M SD 

A 49 425 58 68 48.8 n 

B 46 442 73 73 50.1 z 
G S1 452 87 67 483 8 


F (between styles) = 2.8; p= >.05. t (closed v$ 
open book) = 6.5; p= <.01. 


* See text for description. 


dents in the open book examination Pia 
completed the course than did those in t 
closed book examination groups. zand 
The variance between the “open book pe 
“closed book” subjects’ performance on ane 
pre-test was not significant (p > .05) as as 
vealed by the F of 2.22. The F of 1.59 i 
not significant (p > .05) for the variance cts 
tween pre-test scores for groups of sing ne 
assigned each type of material, On the ba a 
of these data, and in the absence of an 
definite information about the subjects’ cha 


Pit aah s 
acteristics it was assumed that the grouPS | 


were from the same population. de- 
The second step in the analysis was tO o 
termine differences in performance of each är 
the experimental groups on the final erim 
tion? The mean achievement scores f r 
standard deviations are shown in Table 2 f0 
each of the experimental groups. of 
The difference between the achievement ja 
subjects who were assigned course me 
written in different styles was not significa 
Subjects taking the open-book examinatu i 
however, achieved significantly (p <" d- 
higher scores than those taking the close 
book examination (t = 6.54). n- 
similar analysis was made for the ea 
tion test results. The retention test was en 
every respect the same test that was SIV ty 
for the final examination. It was given pee 
days after receipt of the final examinat! 


Methods of Presentation and Examining Conditions 


Table 3 


Retention Examination Scores for the 
Experimental Groups 


Closed Book Exam Open Book Exam 

Style* N M SD. N M SOD. 
A 47 423 59 67 450 71 
B 45 41.0 127 70 45.6 87 
c 51 445 86 67 45.7 83 


F (between styles) = 0.9; p = >.05. t (open vs. 
closed book) = 3.0; p = .01. 
* 
See text for descriptions. 


and was administered in the absence of for- 
mal course materials. Again the style in 
which the course was written appeared to 
have no significant effect on the retention of 
the students’ achievement level. The quality 
control did, however, have an effect. The 
achievement scores of individuals who took 
the closed-book examination were significantly 
(t= 2.96) lower than those who took the 
open-book examination. The means and 
Standard deviations for these groups are 
shown in Table 3. 

f The average individual loss between the 
final and retention examination and the ex- 
tent to which the experimental variables were 
a factor in retention of achievement level is 
Shown in Table 4. Differences in losses were 
Not found to be significant for the types or 
Styles of materials. The losses in achieve- 


Table 4 
Losses During the Time Between the Final Exami- 
hation and the Retention Examination for 
Each of the Experimental Groups 


Closed Book Exam Open Book Exam 
Stet oN M SD. N M SD. 
S 47 —03 49 67 —39 46 
c 42 —0.5 46 69 —38 56 
51 —0.7 3.6 67 —26 5.7 
y (between styles) = 0.8; p = >-05. t (open vs. 


clo; 
sed book) = 5.8; p = <.01. 


€e text for description. 


255 


ment level of those subjects who took the 
closed-book examination, on the other hand, 
were significantly less (t= 5.67) than the 
losses in achievement of subjects who took 
the open-book examination. 


Summary 


The present study is a report of an experi- 
ment conducted with a correspondence course 
population. Two hypotheses were investi- 
gated during the course of the experiment. 
One hypothesis was that three different styles 
of presenting course materials would have 
differential effects on student achievement 
and retention of achievement level. The sec- 
ond hypothesis was that the degree of quality 
control, as imposed by the open and closed 
book examination, would have no effect on 
the achievement level and retention of the 
achievement level by students. 

The styles of presenting course materials 
(popular, expository, and study guide) were 
not found to be different in their relative ef- 
fectiveness as measured by an achievement 
examination. Nor did these methods affect 
the retention of the achievement level. On 
the other hand, the subjects who used the ex- 
amination with reference to course materials 
(open book examination) had higher final 
and retention examination scores than did 
those students who took the examination un- 
der monitorship without the use of the text 
materials (closed book examination). The 
subjects who took the closed book examina- 
tion maintained their original achievement 
level while those who took the open book 
examination made significant losses over a 
thirty-day period. ‘ 

The administrative procedures required in 
conducting the experiment with correspond- 
ence course populations were found to be too 
ponderous for practical purposes. The rec- 
ommendation is made that hypotheses be 
tested with more readily available populations 
of subjects and the results applied to corre- 
spondence course usage. 


Received August 21, 1953. 


THE JOURNAL OF APPLIED PSYCHOLOGY 
Vol. 38, No. 4, 1954 


The Use of Levels of Confidence in Item Analysis 


Valentine 


Appel: 


Richardson, Bellows, Henry & Co. Inc. 


and 


David Kipnis ° 


New York Universit y 


One of the problems of item analysis is the 
standard to be employed in selecting items 
for inclusion in a final scoring key.* Typi- 
cally, some level of confidence is arbitrarily 
chosen and those items which discriminate at 
this level are selected. Guilford (6) as well 
as others have indicated that the 5% and 1% 
levels of confidence should be used as guides 
when selecting items, 

Little consideration has been given in the 
literature to the influence of the size of the 
item analysis sample upon the level of con- 
fidence at which 
discriminate. 
availability of large samples, and so this prob- 
lem has probabl 
larly important, 
often impossible 
plied research. 


than 100 cases, 

Item validities, when 
external criterion, are ty 
small item analysis sam 
pected item validities often cannot be reason- 
ably expected to exceed levels of confidence 
as rigorous as the conventional 1% and 5% 
levels. The establishment of such rigorous 
standards would therefore be expected to re- 
sult in the rejection of a large number of 
truly discriminating items, 

This was recently demonstrated in a study 


computed against an 
pically low. Given a 
ple, the resulting ex- 


1 Presently with Nowland & Schladermundt, Green- 
wich, Conn, 


. * Formerly with Richardson, Bellows, Henry & Co., 
nc, 


by Feldman (4), although within a aD 
narrow range of levels of confidence. d- 
used the 1%, 2%, and 5% levels as stang 
ards for item inclusion with high and low cri- 
terion groups of 42 cases each. On pros? 
validation, he found that the key containing 
only those items which discriminated at ae 
1% level was generally less valid than : 
keys containing all items which discriminate 
at the less rigorous 2% and 5% levels. irl 

It would be expected that, given a Pa 
large item analysis sample, more of the item 
will show validity exceeding a rigorous leve 
of confidence. The establishment of a Jess 
rigorous standard would therefore be more 
likely to result in a greater proportion of Ta 
valid than valid variance being added to t 
scoring key. The problem becomes one O 
striking the most favorable balance betwee? 
the number of truly valid items rejected an 
the number of truly nonvalid items selecte¢- 

The purpose of this study was to test the 
general point that an important consideration 
in establishing a standard for item inclusie? 
is the size of the item analysis sample avai- 
able. More Specifically, the hypothesis WaS 
tested that for maximal test validity, thé 
smaller the Sample size available, the less 
rigorous should be the level of confidence 5°- 
lected as a Standard for item inclusion. Corr 
versely, given a large sample size, maxima 
validity can be achieved by establishing * 
More rigorous Standard. 


Method 


Instruments and Population Employed. a 
study was performed using the RBH Supervis® 


Judgment Test (SJT) to predict an inteligenco 
test criterion Provided by the short form © 
Armed Forces Qualification Test (AFQT). eful 
The SJT is a test which has been found Lee 
for the prediction of Supervisory success. It ¢ 


fe 


The Use of Levels of Confidence in Item Analysis 


arate 33 items of four and five alternatives in 
Me ich the examinee is presented with a series of 
b pervisory problems and asked to choose the 
esa and worst alternative for solving each. 
Ta e AFQT is a 25-minute timed intelligence 
Verba eae of 45 items covering the areas of 

a mathematical, and spatial abilities. 
silifch pee of a supervisory selection study 
oe ad been recently completed (5), a num- 
had b cases were available in which examinees 
AFOT. administered both the SJT and the 
of S40 f The experimental population consisted 
vio rst, second, and third line civilian super- 

pe at two United States Army Arsenals. 

Hs Analysis. For purposes of item analysis 
vi aa ae ee population was randomly di- 
Gxt ga three samples of 80, 150, and 300 
not ‘acl he remaining 10 cases of the 540 were 
ater es ed in the item analysis. They were 
ae sa however, in the validation series. 
edn ; of these samples the following pro- 
groups was followed: High and low criterion 
and hs were designated by selecting the upper 
Pea. 27% of the AFQT distribution. The 
Eton age of cases in the high and low criterion 
Was a who responded to each SJT alternative 
ifior en determined and the significance of the 
gro ence between the percentages in the two 

Ups was computed. 
en of Scoring Keys. From the item 
samples data derived from each of the three 
ing kee four plus and minus unit weighted scor- 
APOT YS were constructed for predicting the 
all oe oe These keys were composed of 
ee alternatives discriminating at and beyond 
one i? 5%, 20%, and 50% levels of confidence, 
confide, being constructed for each of these four 
Were nce levels. A total of 12 scoring keys 
al er structed in all. The number of scored 
Sura atives comprising each of these keys is 

marized in Table 1. 

lation of Constructed Keys. To validate 
ii coring keys which were developed on the 
ploy of the item analysis, it was essential to em- 
Which ao Ples independent of the samples from 

the keys had been developed. In order to 


ful : : 
fill this requirement and also to make maximal 


Table 1 


Number of Scored Alternatives in Each Key * 


Item Analysis Sample Size 


Level 

of RT a a 
18) 

nfidence 30 150 300 
1% = = 
8 27 55 
a 34 52 96 
5030 82 110 143 
o 167 187 221 


* 

. The 

tncludin Ta possible number of scored alternatives, 
g “best” and “worst” responses, was 302. 


257 


Table 2 


Validity Coefficients for the Twelve Keys 


Item, 
Analysis Level of Confidence 
Sample 
Size Group* 50% 20% 5% 1% 
A -664 .576 596 501 
80 B -676 611 563 516 
Cc 617 .636 523 392 
Mean .653 -608 561 471 
D .699 730.702 634 
150 E 651 004 528 A496 
F .677 .689 -670 AS 
Mean 676 677 639 «623 
G 711 2735 714 -700 
300 H 647 054 651 -605 
I .585 -612 614 605 
Mean .650 .670 -661 639 


* Each group is composed of 60 cases. 


use of the available data, a procedure was fol- 
similar to one recently proposed by Katzell 
Ls 
The cases employed in each of the item analy- 
sis samples were systematically reassigned so that 
the scoring keys constructed from one item analy- 
sis sample were employed to score cases selected 
from the other samples. Thus, for example, the 
cases which were employed in the item analysis 
of the 300 case sample were systematically re- 
distributed to form groups which could be used 
to score the keys which were developed from the 
80 and 150 case samples. The cases from the 
$0 and 150 case samples were similarly reas- 


Following this procedure, nine independent 
validation groups (designated A through I), each 
containing 60 cases, 
B, and C were assigned to be scored with the 
four scoring keys devel 
item analysis sample; 


assigned to be scored 7 
developed from the 150 case item analysis sam- 


ple; and Groups G, H, and I 
the four keys developed from t 


analysis sample. 
tions of each of the four 
criterion were then compu 
60 case validation groups. 


Results 


The validity coefficients computed for each 
of the keys on each of the validation groups 


are summarized in Table 2. To test the hy- 


258 


Valentine Appel and David Kipnis 


Table 3 
Analysis of Variance of the Validity Coefficients 
Source of Variation Sum of Squares df M? F 
1.92 
Between sample sizes -1266 2 pee 
Between groups of same size 1974 6 -0329 
Total between groups -3240 8 2 
Between levels of confidence -0903 3 -0301 a. 
Level of confidence X sample size 0528 6 -0088 k 
Pooled groups X level of confidence -0580 18 0032 
Total within groups -2011 27 
gue = 
Total 5251 35 _s 


* Significant at the 5% level. 


pothesis that the differences among the va- 
lidities of the various keys could be attributed 
to errors of sampling, an analysis of variance 
of the validity coefficients was carried out. 
Since it is known that the sampling distribu- 
tion of r’s does not meet the assumption of 
Normality required for 
each r was transformed 
the distribution of which 
mal (3), 


ried out according to the Type I design out- 


and also discussed by 
- The results of this 
analysis have been summarized in Table 3. 
yed in this analysis 
Since the variance be- 


the interaction 
and sample size, 
estimates derive 


the four coefficients in any row of Table 2 


tance was 


terms for 
groups of the same sample size by level of 


confidence. When tested against this error 
term, the variance attributable to the inter- 
action between level of confidence and sam- 
ple size was Significant beyond the 5% level. 
The variance attributable to this interaction 
was therefore employed as the error term in 


testing the significance of the level of CO? 
fidence main effect, in- 
Only the variance attributable to the i 
teraction between level of confidence 3% 
sample size was significant beyond the at 
level. Neither of the main effects were $ r- 
tistically significant. This may be m 
preted to mean that, within the limits of t 
present study, there is no one optimal leve 
of confidence to be employed as a stander 
for item inclusion, Rather, the optimal lev e 
of confidence is a function of the sample 5!” 
employed for item analysis. hat 
Examination of Table 2 would indicate t z 
the smaller the sample available for ee 
analysis, the less rigorous should be the y z 
of confidence employed. In short, the 4 
Pothesis tested was essentially substantiate’ 


Discussion 

The results of this study, insofar as En 
may be generalized, indicate that there is ld 
one optimal level of confidence which SA 
be employed when item analyzing test a 
Particularly Pertinent is the result that SUC 
arbitrarily designated confidence levels aS oo 
1% and 5% often cannot be expected tO ay 
sult in maximal cross validities. In ee 
Cases, Particularly when the size of the sa" 
ple available for item analysis is sally 
much less stringent standard may be expect re 
to result in higher validities than the ™° 


conventional 1% or 5%, levels, all 
Especially Striking is the fact that, E 
sample sizes 


employed in this study, the 


p -E 


The Use of Levels of Confidence in Item Analysis 


Scoring keys consistently resulted in higher 
validities than the keys composed of items 
which discriminated at the 1% level. It 
should be noted, however, that the validities 
of the 50% key based upon the 300 case 
oe had started to shrink although the 
%o and 1% keys showed continuous incre- 
Pain in validity as the item analysis sample 
“Vu were increased. This would suggest that 
hot samples had been employed in the 
h n analyses the greatest validities would 
x been produced by the 5% and 1% keys. 
saa sn appear that any arbitrarily chosen 
d i confidence is likely to be a poor stand- 
if ie item inclusion. Levels of confidence, 
coma are to be employed at all, ought to 
as er the sample size available for the 
the "soi The smaller the sample size, 
ee ess rigorous should be the level of con- 
nce required. 
would seem that standards for item in- 
at ion might profitably be established with- 
Th: ay reference to levels of confidence. 
at is, instead of specifying in advance that 
y items discriminating at the, say, 1% or 
P a be included in the test, an alter- 
ie is suggested. Such a pro- 
valai would entail the computation of item 
E ng indices which are independent of the 
ea size upon which the item analysis is 
RAR Egs biserial z, phi coefficient, etc. 
ae items would then be arranged in de- 
eae order of validity and a cutting point 
cted above which items would be selected 
ht inclusion in the scoring key and below 
a they would be discarded. Since few 
faa ciples are available as to where the opti- 
a Cutting point should be, the decision as 
Pa constitutes minimally acceptable item 
one ity will probably have to be an arbitrary 
based upon the judgment of the test con- 
Tuctor, 


259 


Only after the items have been selected 
should any reference be made to the level of 
confidence at which they discriminate. The 
level of confidence corresponding to the mini- 
mally acceptable standard of item validity 
can then be determined, and the number of 
items exceeding this standard can be com- 
pared with chance expectancies (1). If the 
selected number of item alternatives exceeds 
chance expectancy, it is likely that a scoring 
key composed of these items will continue to 
discriminate if applied to new samples. 


Received August 31, 1953. 


References 


1. Brozek, J. and Tiede, K. Reliable and question- 
able significance in a series of statistical tests. 
Psychol. Bull., 1952, 49, 339-341. 

2. Edwards, A. L. Experimental design in psycho- 
logical research. New York: Rinehart & Co., 
1950. 

3. Ely, J. H. Studies in item analysis 2: Effects of 
various methods upon test reliability. J. appl. 
Psychol., 1951, 35, 194-203. 

4, Feldman, M. J. The effects of the size of cri- 
terion groups and the level of significance in 
selecting test items on the validity of tests. 
Educ. Psychol. Measmt., 1953, 13, 273-279. 

. PRB Research Note 12, Edgerton, H. A. and 
Thomson, K. F., et al. Development of Tech- 
niques for the Selection of Wage Board Su 
pervisors at Army Arsenals, 30 June 1953. 
Copies of this report may be obtained from 
the American Documentation Institute, Aux- 
iliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, 
D. C. Order Document Number 4092: Micro- 
film copy, $3.50; photostat copy, $10.00. 

6. Guilford, J. P. The phi coefficient and chi-square 
as indices of item validity. Psychometrika, 
1941, 6, 11-19. 

. Katzell, R. A. Cross validation of item analyses. 
Educ. Psychol. Measmt., 1951, 11, 16-22. 

8. Lindquist, E. F. Design and analysis of experi- 

ments in psychology and education. Boston: 
Houghton Mifflin Co., 1953. 


n 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 4, 1954 


Some Computational Short-Cuts in the Development or Analysis 
of Tests 


Angus G. MacLean and Arthur T. Tait 


California Test Bureau, Los Angeles, California 


In developing a new test or in evaluating an 
existing one, the procedure is to administer it 
to a sample of the population for which it is 
intended, and obtain the following statistics: 
Mean; Variance; Reliability ; Item difficulties; 
and Item-test correlations. 

In addition to these Statistics, since the selec- 
tion of items on the basis of item-test corre- 
lations does not insure homogeneity of test 
content (2), some index of item-value is needed. 
In a recent article (1) the present authors 
Suggested: (a) computing the inter-item co- 
variance total for each item; and (b) selecting 
only those items whose covariance totals ex- 
ceed their variances. The former indicates the 
contribution of each item to reliability (homo- 
geneity) while the latter points up the contri- 
bution to unreliability (heterogeneity), 

In the same article a method was described 
whereby all the information listed above, plus 
the S-indices (item-selection indices), can be 
obtained in one operation, and with less com- 
putation than is required for computing each 
of these statistics Separately and by the usual 
formulas, Accuracy is also improved and du- 
plication of computations eliminated, For the 
sake of exhibiting the rationale and general 
method, the full computational procedure was 
given. However, there are many short-cuts 
and, as the specific purpose varies, some steps 
may become unnecessary, At this Point it is 
proposed to describe the most economical 
methods of: (a) effecting the Preliminary ar- 
rangement of the data; (b) obtaining the mean, 
variance (standard deviation) and reliability ; 
(c) obtaining the item difficulties and/or the 
mean item difficulty ; (d) obtaining the item- 
test correlations; and (e) obtaining the selec- 
tion-indices. These elements are arranged 
sequentially, i.e., each earlier step is necessary 
to each subsequent one. Items are assumed 
to be scored 1 or 0, and ‘reliability’ refers to 
Kuder-Richardson formula 20. 


(a) A count is made (by hand or machine), 
of the number of cases “passing” each item 
denoted by fii, and of the number of cases 
passing both of every possible pair of iten 
denoted by fij. If there are n items there w1 
ben fis and D hy, These frequencies 
are then displayed in the ‘¥’-matrix. ri a 
ofan Xn table. In row 1 column 1 is i 
fu. In row 2 column 2 is placed fz- i d 
diagonal from top left to bottom right is ca er 
the principal diagonal and its elements, £ 
divided by V, the total number of cases, n 
come the item “difficulties,” usually d 
Żii, OF proportion passing on each item. cell 
row 1 (corresponding to item 1), all other n 
entries indicate the number of cases passing hé 
both item 1 and the item corresponding to se 
column number, Thus in row 1 cell 5 the agen 
will be the number of cases who gave the i is 
rect response to both items 1 and 5. ane 
denoted by fis. Once every subject’s TBM 
to every item has been punched, the the 
electronic statistical machine can eee 
P-matrix very quickly, If there are te the 
Counters, it saves computing time to rae 
Sum of the entries in each row at this id, 

(b) Once the F-matrix (which shou is ob- 
the way, be symmetrical, i.e., fii = po: 


r First. 
tained, there is little computing to do. achine; 
if this is not already accomplished by aT the 


obtain the row sums, and then og 1 the 
Sum of these. Thus T is the sum of all. t 


Ta com- 
entries in the matrix. A quicker way to 


end 
pute T, if only the mean, variance (S.D.) J 


reliability are desired, is to sum the fregio , 
on one side of the principal diagoné us. be 
double this sum and add Y, fi, which ™ avoid 
obtained Separately anyway. It helps tf ack al 
mistakes if a line is ruled along the aii 


* When neither a ti n IB 
hik abulator nor a 
Statistical Machine is available, it is best 
struct an F-matrix, but to adopt a proce 


will be described below, 


260 


d, by i 


1 
i 


Computational Short-Cuts 


diagonal, from the top left corner of the matrix 
to the bottom right. 

2 is also necessary to obtain X fi and Di} 
these are computed simultaneously on modern 
desk calculators. 

Operations of the type kx — ys or kx — y 
“i single operations on modern calculators, so 
b at a computational entity has come into 
ming known as L, defined as No; that is, if 
7 is divided by A? the variance is obtained. 

is defined by: 


Liz = NEX? — (EX). (1) 
In the Kuder-Richardson formula 20, 
_ n Lei 
ne G(- z), o) 


where ø;?, the variance of item å, is defined by 
oÈ = pu — pir (3) 


, But it is unnecessary to divide both numera- 
or and denominator in (2) by V*, so 


n Z Lü 

ES as] 1-2), (4) 
where Dh = NEfa— fits 

and y= NT = (Xfi)? 
Th į Lu m 
en oi = 2? (5) 
and M= Ls, (6) 
and 


where 2 denotes the variance of the test 
« the mean. 


Tn (2) the factor 


T 2 X, . . 
my the sum of the inter-item covariance 
of . . 
A Yoiz), is not precisely the true 
‘nce, o4, but 


no. 
78 present because 


es, or 
vari- 


(7) 


is defined as the 
at propor- 
tributable 


P eye 
rane retore, since reliability 
atio of true to tot] variance, or th 


tio 3 
ti n of the variance which is not at 
0 error, 


(8) 


in Development of Tests 261 


The square of the standard error of measure- 
ment is defined by 
o2 = of — Oa (9) 


n 
n—i 


=f “on, (10) 
but the usual formula, which is equally effi- 
cient provided enough significant figures are 


used for rx, is 


c= of(1 = Tu), (11) 
which, incidentally, demonstrates that 
Ox? = fus’. (12) 


It may be of interest to note that (5) may 
be written 


ONT fa? 
o = Ne N? 
T 
=m Mè. (13) 


(c) If the separate item “difficulties” are re- 
quired, e.g., to arrange the items in order of 


a n if z E 
difficulty, obtain 7 to sufficient significant 


figures and lock it in as a constant multiplier, 
then run down the principal diagonal convert- 
ing each fi in turn into pi; i.€., 


(14) 


Tf on the other hand only the mean difficulty 
is required 


i M: 
P= nN Lh = 


Should the variance of item difficulties be 
desired, it is clearly obtained by 
ad fiè — of? 

wn? ` 

relations (and/or selec- 
tion indices) are required, the sum of each row 
in the F-matrix should have been recorded, 
e.g., in a column to the right of the matrix. 
Then first obtain ci, the item-test covariance, 
as follows: 


oe = x (zr = fey + oe 


(15) 


(15a) 


2 


op 


(d) If item-test cor 


262 


where Ż f- denotes the row-total for item i, 
If difficulties have been recorded in a column, 


fa, In (16) Efe 


f 
is, of course, a constant and it is simpler from 
a computational point of view to make two 
columns out of (16), first locking in X fi and 
obtaining the differences and then multiplying 
them by the locked-in reciprocal of N. Tf, 
however, the items are to be selected by use 
of the S-index rather than by selecting the best 


TiS, do not use (16) at all, but see section (e) 
below. 


Pi: can be substituted for 


The item-test correlations are obtained by: 
oir 

Tit = 2 per 

Voi oè 


, (17) 


where o;7 and of are already 
is the point-biserial correlati 
and total test, 

(e) The select 
fined by 


available. This 
on between item 


ion index for item i, S; is de- 


Si = oi2 — 20,2, 

[For the full explanation 
Sis hegative, item 
more so, the greater it 


ever, if e; has not b 
fined by 


(18) 


see reference (1).] 
i should be rejected, the 
s absolute value. How- 
een computed, S; is de- 


1 
Sis ye (Lie 2La). (19) 


The best computational procedure is to ob- 
tain, for each item, the v; 


alue (NE fp; — Bc 
Df) and record it; then obtain ie = 
and recordit. Tt will be noticed that the latter 
is Li; while the former is La, Then subtract 
2L;; from Ly and record the diff 


algebraic sign, Finally, 


the substantially ne 
so that their rows and columns ar 
those with small original negative 5's may now 
have only Positive covariances with the re- 
maining items, and their covariance totals may 
now exceed their variances, giving them a posi- 
tive S-index. Therefore, the large negatives 
should be eliminated first and the indices Te- 


e removed, 


Angus G. MacLean and Arthur T. Tait 


TE increase 
calculated. Also, as 7 diminishes, the inc 


tk ; an com- — 
in the ratio - sometimes more than f) 
1 


Ae ined in ru by 
pensates for the increase to be gained in rı 
rejecting one or two small negatives. 


Procedure When Machine Facilities 
are Limited 


Some of the statistics used in the Hel 
formulas are identical with those se ee 
the ordinary Process of obtaining indiv tion. 
Scores and their mean and standard devia Xê 
T is the quantity usually denoted by Jivid- 
that is, the sum of the squares of the in The 
uals’ total-test scores, and Z fii is LX i Sa: 
only other statistics needed are >> fi? an in 
With regard to the first, it is pacman’ 
developing a test, to obtain the frequency ton 
ing each item in order to compute ae for. 
difficulties, So nothing unusual us cil ‘bo is 
The only extra step required by this me pi: 
obtaining, for each item, the quara or 9; 

Now, as long as items are scored total- 
Z fi is the same as ÈZ Xin the sum of the ct re- 
test scores of those who gave the corre sum 
Sponse to itemi. ‘This is a cross-product 
and gives rise to tH 


he formula 20) 

Li = NK, =. Sw LX. s e 

Pa. ni 

Fa is, of course, identical with EX Mia re- 

other formulas above may similarly X, for 
written, substituting UX? for T, ZA 

X fin and EX for Zafri in EX“ 

The most elementary way to obtain It 


would be to hand-sort’ the answer-sheet® n. 
is quicker and easier (and has other » item- 
tages, as will be seen) either to punčki an 
Scores and total-test scores on IBM car! sor i 
use a mechanical Sorter, or to use “needle- e 
cards, which are punched around the ee. 
hand and Sorted by inserting a sorting ee a 
This last method was recently tried a con- 
to determine the amount of time it woul ards, 
Sume. Total scores were written on the © 
and a sort was made for each item, thet ent 
Scores of those who had gotten the item were 
Were added on a desk calculator. T i 
124 items and 109 individuals (1 card for i 
and the whole Operation, from we took 
nal summing of the X Xirs as a chec inutes 
between 19 and 20 hours, or about 9 m 


———-— 


À 


= per item. 


— N 


Computational Short-Cuts in Development of Tests 


This is an over-estimate of the gen- 
eral average because, with that number of 
items, two cards had to be stapled together for 
each individual, and they had to be very care- 
fully fitted together so that the holes would be 
in the right positions. With 100 or fewer 
items this time-consuming step would be elimi- 
nated and the sorts would be quicker too; 
about 7 minutes per item would be a fair allow- 
ance with V = 100. With a key-punch and a 
mechanical sorter the punching and sorting 
would be quicker, but the real saving in time 
Occurs when a tabulator is available for the 
adding —here two or two and a half minutes 
per item is ample allowance. 
The full F-matrix method has, however, cer- 
a advantages, the most prominent of which 
nis: if any items have to be eliminated, all 
Ming is required is to delete the corresponding 
s and columns and to obtain new row sums. 
one is using È} Xin on the other hand, the 
answer-sheets have to be rescored and new 
total-test scores punched, and the sorting-and- 
Summing operation repeated. Thus, the F- 
matrix, though more work at first, is likely to 
i ae work on the whole unless the number 
ems is large. Of course, if there is no 
question of eliminating items, but only of eval- 


263 


uating the merits (or demerits) of the items 
present, the ZX method is the more eco- 
nomical, the greater the number of items in- 
volved. The reason for this is that the time 
it requires increases linearly with the number 
of items, while the time required by the F- 
matrix increases as the square—thirty-five 
minutes for a 15-item test, about 4 hours for 
a 50-item test, and probably two days for a 
100-item test. If the matrix can be broken 
down into subtests of not more than 15 items 
each, however, the time required is again a 
linear function—roughly half an hour for each 
such subtest. If this cannot be done, it is 
better to use the X Xx method, eliminate all 
items with negative S-indices, and repeat the 
scoring, sorting and summing to obtain the 
final item values. 

Received April 5, 1954. 

Early publication. 


References 


1. MacLean, A. G. and Tait, A. T. A procedure 
for analyzing a test and maximizing its reli- 
ability. J. exper. Educ., 1954, 22, 3. 

2. Mosier, C. I. A note on item analysis and the 
criterion of internal consistency. Psycho- 
metrika, 1936, 1, 275-282. 


THE JOURNAL or APPLIED PsycuoLocy 
Vol. 38, No. 4, 1954 


Some Relationships Between the 


Robert F. 


MMPI and a Problem Checklist 


Lockman 


Student Counseling Bureau, University of Minnesota 


The purpose of this study was to deter- 
mine what, if any, relationships exist be- 
tween a problem checklist used in the Stu- 
dent Counseling Bureau at the University of 
Minnesota and the Minnesota Multiphasic 
Personality Inventory. Berdie (1) related 
this same problem checklist to the Minnesota 
Personality Scale and found in his sample that 
students with low scores (indicative of the 
presence of problems) on various sections of 
the Scale tended to indicate related problems 
on the checklist. An added purpose of the 
present study was to compare checklist re- 
sponses with those in Berdie’s study con- 
ducted eight years earlier, 

The problem checklist 1 
and instructs the student to 
he has not adequately soly 
check those which he wants 
counselor. 


contains 33 items 
check those which 
ed and to double 
to discuss with a 


Procedure 

Checklist responses and MMPI T- 
were obtained for 3 
students counseled 
1948-1949 college 
cluded all college an 
whom complete da 
dents with MMPI’s 
eliminated. Cutting 


Results 

Checklist Analysis, 
checked items (single a 
bined) dealt with ed 


The most frequently 
nd double checks com- 


ucational and vocational 
problems. Over 80 per cent of the men and 


70 per cent of the women indicated that 
were unable to determine what they were best 
able to do: over 50 per cent of both sexes did 
not know what they wanted to do. One rea- 
son for these results may be the fact that the 


1A reproduction of the checklist y be fi i 
Berdie (1), i 


they 


264 


Student Counseling Bureau is primarily an 
educational and vocational guidance center. 
Other frequently checked problems were con- 
cerned with job Opportunities, duties, and 
training requirements and with study habits. 
In the personal-social problem area, over 
30 per cent of both sexes felt that they lacked 
self-confidence, Twenty-five per cent of the 
women felt that they did not have enough to 
talk about in social situations, ‘ 
Investigation of single and double checking 
of the more frequently expressed problems 
indicated that the subjects seemed more will- 
ing to discuss their educational-vocational 
Problems with a counselor than their per- 
sonal-social problems, They may have pel 
ceived the Student Counseling Bureau rape 
as a place to obtain help on these kinds 0 
Problems. Educational and vocational prob- 
lems are Probably more socially acceptable 
and personally admissable than those dealing 
with personality and social relations. In at- 
tempting to explain this phenomenon, Berdie 
states that: “Reluctance to discuss certain 
types of problems may be due to the fact 
that the students think that nothing can be 
done about (them). They may consider 
their personal Problems too private to discuss 


with a relative Stranger, . . | When students 
come to the co 


(.05 level of 
and women on only one item: I have been 
unab] am best able to 
Y 82 per cent of the me 
only 70 per cent of t 

Otherwise, the men an 


ghly equal on relative pet 
centages of Problems checked. It is not 


known whether this is due to the structure of 
the checklist, the actual incidence of suc 


Problems jn these groups, or other unidenti- 
fied variables. i 


do. Ap 
checked it, while 
women did so. 
women were roy 


A 


ad dial 


rae 


Relationships Between MMPI and a Problem Checklist 


Comparison of Checklist Responses. Com- 
parison of the total percentages of checks in 
Berdie’s study with the present investigation 
yielded no significant differences between the 
women in these two samples. However, seven 
significant differences were found on checklist 
problems between the male samples. Signifi- 
cantly greater percentages were found by 
Berdie on two items: 2 


I usually feel inferior to my associates (.05) 
I do not know how to obtain the money I 
need (.05) 


Tn the present study significantly greater per- 
centages were found on the following items: 


I am unable to determine what I would like 
to do (.05) 

I am frequently embarrassed when with 
others (.05) 

I have so much outside work that I am 
heglecting my school work (.05) 


i ” The numbers in parentheses following the item 
Ndicate the level of confidence. 


265 


I do not know how to take good lecture 
notes (.01) 
I am not interested in my studies (.01) 


The above differences may be a function of 
sample sizes and composition (e.g., the load- 
ing in the present study of returned service- 
men), an actual change in student problems 
over a period of time, or of other factors not 
readily apparent. 

MMPI Characteristics of Groups Checking 
Many and Few Problems. The median total 
number of problems checked, regardless of 
their nature, was four for the men and five 
for the women. The male average was 4.8 
with a standard deviation of 22.9; the female 
average was 4.9 with a standard deviation of 
12.4. Thus, the men were nearly twice as 
variable in the sheer number of problems 
they checked than were the women. On the 
basis of these statistics, the male and female 
groups were separated into two groups: the 
“High” group (checking five-or-more prob- 
lems) and the “Low” group (checking four- 
or-less problems). Critical ratios were com- 


Table 1 
Comparisons of Mean T-Scores of High and Low Problem Groups 
Men Women 
a 
i Low High 
Low High aa eGo 

MMPI (N = ‘{90) (N = 145) (N = 63) (WV ) 
c / 

Scale Mean S.D. Mean S.D. CR rp Mean S.D. Mean S.D. CR ro 
: 5 = = 00 00 S01 OF — = 
: 5 0.0 3 50.1 - 

p Pe ve me 29 261% 17° 522 40 517 44 (O71 08 
x 533 do sse 7.0 Sie =a 49 MS 66 A 28°, 
sa 86524 9a 5AF S° ga 526 87 3.82" Al 
_ 510 7g ge b «(0G =o 50 489 77 09 Ot 
D a1 96 gs 119 28A 28h 78 322 s1 143 1 
Hy s4 7. 0.56 —.04 3 53.7 «9.2 k i 
Pa a T P he 208" = 14° 54.4 10.7 56.9 11.3 1.28 per 
= en ee au 478 131 506 85 Lal 16, 
Pa 567 W7 og 24E d8 53.0 8.5 566 86 2.38" —.26° 
rE as 10e -ge 528 80 564 93 232% —.26° 
Se mae ee ae on 3.97%* —.28°° 545 7.1 58.2 103 asi = 
Ma 558 82 6 2s i8 =P 550 111 574 118 ie cu, 
Tr 5 ma oe 100 546 —.37°° 50.0 7.5 334 10.9 2: i 

l. 3 53. . - 


* ` 

P Significant at the .05 level of confidence. 

o Significant at the .01 level of confidence. _ 

oo Significantly different from zero at the .05 
Significantly different from zero @ 


level of confidence. 
t the .01 level of confidence. 


266 Robert F. 
puted between the High and Low groups on 
each MMPI scale. The results are presented 
in Table 1. 

Both the male and female High 
were significantly higher than the Low groups 
on the F, Pa, Pt, Sc, and IE scales. The F 
scale indicates “faking bad” or inability to 
comprehend the inventory items. The Pa 
scale indicates tendencies toward sensitivity, 
hostility, and difficulty in taking criticism. 
The Pt scale indicates tendencies toward anx- 
iety, indecisiveness, and feelings of inade- 
quacy and insecurity. The Sc scale indicates 
tendencies toward fantasy, shyness, and with- 
drawal. The IE scale indicates tendencies 
toward social introversion (2). 

The male High group was also significantly 
higher than the Low group on the D scale 
(indicating depression, discouragement, or de- 
jection of a situational or Prevailing nature) 
and the Pd scale (indicating nonconformity, 
irresponsibility, impulsiveness. and asociality) , 

Both the male and female Low groups were 
significantly higher than the High groups on 
the K scale. This scale indicates test-con- 
sciousness, defensiveness, and an attitude of 
problem denial. The male Low group was 
also significantly higher than the male High 
group on the L scale, a measure of the degree 
to which the subject may be attempting to 
falsify his scores by always choosing the re- 
sponse that puts him in the Most socially ac- 
ceptable light. 
Biserjal y's (see Table 1) for all of the 
above comparisons were significantly greater 

than zero, but the amount of overlap of the 
High and Low groups was too great to enable 
accurate classification into these groupings 
solely on the basis of MMPI scores alone. 
Nor would the number of problems an indi- 
vidual checked be effective in Predicting his 
MMPI scores. The significant differences ob- 
tained, then, are chiefly statistical rather than 
practical in nature. Only tendencies for these 
groups may be legitimately pointed out on the 
basis of these differences. Tt does seem, how- 
ever, that individuals who check many prob- 
lems in this sample tend to have somewhat 
more deviant MMPI profiles than those who 
check few problems, although those who check 
few problems may be denying the existence 
of other difficulties (high K score). 


groups 


Lockman 


Checklist Responses of Subjects Grouped 
According to Their Highest MMPI Scale 
Score. Another method of treating the data 
was to group the men and women separately 
according to their highest score on the MMPI 
clinical scales. An individual’s highest scale 
score would be the one indicated by the high- 
est “peak” on his MMPI profile, regardless of 
Score magnitude. For both men and women, 
approximately 50 per cent of each group 
checked problems 6 and 10 on the checklist. 
These were the most frequently checked items 
for the whole sample, so they are valueless as 
far as differential prediction is concerned. 

Half of the men with highest scores on the 
D, Pt, and IE scales indicated on the check- 
list that they lacked self-confidence. In other 
Words, there was a tendency in this sample 
for an admitted lack of self-confidence © 
accompany characteristics assessed by: the 
MMPI as depression, anxiety, engi 
compulsiveness, feelings of inadequacy a? 
insecurity, and withdrawal tendencies. 

Half of the men whose highest score was 07 
the Pa scale checked problems related to & 
lack of job information and reading difficul- 
ties. High Pa scores are interpreted as ET 
dicative of sensitive, hostile, and paranoi 
tendencies, 

Half of the women with highest scores De 
the Sc and IE scales stated on the checklist 
that they did not have enough to talk about 
in company. Se and IE peaks are indicative 
of shy, withdrawing, socially introvertive Þe- 
havior. Half of the women with Pa peaks 
also indicated that they lack job information 
as did half of the men with the same highest 
MMPI score. al 
In general, there seems to be some logica 
correspondence between several of the check- 
list problems and personality characteristics 


as assessed by the MMPI. This relationship 
Is more obyj 


scales than į 
Since the 
the high 
N f 


Relationships Between MMPI and a Problem Checklist 267 


able that with sufficiently large homogeneous 
MMPI scale groups, differential problem syn- 
dromes might be found on the checklist. 
Pattern analysis of both the checklist and 
the MMPI (3, 4) and their interrelations 
might also prove to be a fruitful technique. 
The value of such research would be in ob- 
taining stable correlates of personality with 
respect to expressed problems and stated 
heeds as indicated by the problem checklist. 


Summary 


Analyses of the problem checklist and its 
relations to the MMPI showed that: 

d 1. The most frequently checked problems 
ealt with educational and vocational diff- 
culties, 

2, Men students were nearly twice as vari- 
able in the number of problems they checked 
as were the women students, although the 
average number of problems checked by each 
sex was roughly the same. 

3. Over a period of time, the relative per- 
centages of responses on the checklist items 

id not appreciably change for the two sam- 
Ples compared. 

ae. The subjects seemed initially less re- 
nae to discuss recognized educational-vo- 
$ ional problems than recognized personal- 
Scial problems with a counselor. 

oe Both men and women students who 
li ecked five-or-more problems on the check- 
st (as opposed to those who checked four- 
a. -less) had statistically, though not prac- 
ically, significant higher mean scores on the 


F, Pa, Pt, Sc, and IE scales and significantly 
lower scores on the K scale of the MMPI. 
Men students checking five-or-more problems 
also had significantly higher Pd and D scores 
and significantly lower scores on the L scale. 
Biserial 7’s for all of the above comparisons 
were significantly greater than zero. 

6. Aside from the most frequently checked 
problems in the whole sample, half of the men 
students with MMPI peaks on D, Pt, and IE 
felt that they lacked self-confidence; half of 
the women students with Sc and IE peaks felt 
that they did not have enough to talk about 
in company; half of both men and women 
with Pa peaks indicated a lack of job infor- 
mation, while these men also checked prob- 
lems dealing with reading difficulties. Ex- 
treme caution is needed in generalizing from 
these results since the criterion groups were 
too small in most instances for stability or 
validity of results derived from them. These 
data, then, should be considered merely as 


descriptive. 


Received August 21, 1953. 


References 


1. Berdie, R. F. An aid to student counselors. 
Educ. psychol. Measmt., 1942, 3, 281-290. 
2. Hathaway, S. R. and McKinley, J. C. Manual 
jor the MMPI. New York: Psychological 
Corp., 1945. 
3. Hathaway, S. R. and Meehl, P. E. An atlas for 
the clinical use of the MMPI. Minneapolis: 
University of Minnesota Press, 1951. 

. Meehl, P. E. Configural scoring. J. consult. Psy- 
chol., 1950, 14, 165-171. 


+ 


THE JOURNAL OF APPLIED PSYCHOLOGY 
Vol. 38, No. 4, 1954 


Facilitating Legislative Research 


Harry A. 


Michigan Stat 


Legislative behavior has been of periodic 
interest to many psychologists. Two meth- 
ods of analysis have been used. A small 
sample of issues is selected and legislators 
compared according to their votes on these 
topics (1, 10, 15). Or a few legislators have 
been studied on many topics (5, 6, 7). Such 
restricted studies concentrate on a few legis- 
lators and a small number of topics. The 
basic paradigm for these legislative studies 
does not differ radically from the familiar 
sociometric analyses of industrial and social 
psychologists (2, 8). 

A major limitation has been the difficulty 
of tabulating joint voting (9). Associated 
with this weakness are other shortcomings. 
Reliability studies of voting are almost non- 
existent (4). Data are presented in tabular 
form and thus relationships among these data 
remain vague (16, 18, 20, 22). 

This paper reports a method for rapid 
tabulation of such data. In final form the 
data are in a symmetric matrix to which a 
variety of statistics may be applied. 


Procedure 


The official legislative journals provide the 
records from which data are obtained. 
formation in these records describes the men, 
their districts, the issues upon which they 
vote, and the roll call votes they cast. Thus, 
in our analyses we may control for the legis- 


In- 


1 Dr. Gloria Lauer Grace assisted in the design of 
these studies. The studies have been financed by the 
University Research Board, University of Illinois, 
1950-1952, and the All-College Research Committee, 
Michigan State College, 1952-1953, Leonard P., 
Staugas, Statistical Service Unit, University of Illi- 
nois, designed the wiring of the accounting machine, 
Types 402-403. Victor E. Buys, Supervisor of 
Tabulating Operations, Statistical Methods Section, 
Division of Disease Control, Records, and Statistics, 
Michigan State Department of Health, designed the 
wiring of the electronic statistical machine, Type 101. 
Norma E, Taschner, Tabulating Office, Michigan 
State College, and Doris L. Duxbury, Statistical 
Methods Section, Michigan State Department of 


Health, were most cooperative in permitting the use 
of their IBM facilities. 


268 


Grace 


e College 1 


lative body, time of meeting, topics, chair- 
man, etc. k 
The data are transcribed on standard mark- 
sensing IBM cards. This process is rapid: 
The card accommodates up to 54 simple items 
of data. More than one card may be used il 
transcribe larger legislative bodies. The ro 
calls list men alphabetically, and so men are 
assigned to columns on the card in alphabet 
cal order. The content of the topic on which 
the vote is taken may also be coded on the 
card, as may other control information. 
each vote is coded in chronological order; 
easy reference may be made to the journal to 
check discrepancies, Al] data are marked on 
the card by electrographic pencil. If 
One card is used for each type of vote. « 
only split roll call votes are tabulated, this 
means a minimum of two cards (affirmative 
and negative), and a maximum of four caras 
(affirmative, negative, absent, abstain) for 
each vote. A 7 is marked in a man’s ene 
on the card which represents the type ° 
vote he has cast. qf he does not vote of 
way, he votes another. Therefore, he w 
have a 1 in one and only one card for eac? 
vote. The other cards for that vote will be 
blank in his column. 
unching the cards is accomplished by ma- 
chine. It is advantageous for comparative- 
historical analyses to have the data ean 
in a definite, Permanent order. A suggeste¢ 
order is numerical, according to the number 
of the district represented by the legislatot 
with the First District in column one, and £0 
forth. Thus, if men should fail at the polls: 
retire, or die, the Position of the district rep- 
resentative is unchanged. When the repro- 
ducing punch is wired for mark-sensing; the 
data may be rearranged from the alphabeticé 
order of the men to the numerical order O 
the district, 
The cards are Prepared for checking bY 
sorting them on the basis of the vote num- 
bers. The accounting machine (Type 402 


Facilitating Legislative Research 


is wired for addition, printing a minor pro- 
ke total each time the vote number 
mea Each vote is listed with its content 
en e identification of the legislature, and 
e total of all the cards for each vote is 
ape See Table 1. If the mark-sensing 
a punching are correct, a series of 1’s ap- 
od in the columns representing the legis- 
ee If a zero appears, it means that the 
gislator has been overlooked. If a 2 or 
eli appears, the man has been given 
sdk OF having cast more than one type of 
aa “a an issue. Correction of these errors 
rer os made by referral to the pencil mark- 
med the cards. If the cards have been in- 
the eed marked, reference must be made to 
chia al The method is remarkably ac- 
chedi importance of having a machine 
pend er than a hand check cannot be 
fies imated in accuracy and amount of 
time saved. Should subdecks for controls on 
fer folk, on be reproduced from the mas- 
sits me i it is highly advantageous that these 
then e machine checked. The investigator is 
‘ae of a perfect working deck at all 
they a If errors later appear, he knows that 
and Me a function of the machine operations 
Th the cards. 
jones final process is the tabulation of the 
cm oe matrix? Two methods are 
Ty e. Either the accounting machine 
biel 402-403) or the electronic statistical 
nate (Type 101) may be used. The ac- 
a Ing machine takes about four times as 
ie and is liable to greater error than the 
tonic statistical machine. The essential 


Table 1 


Facsimile of the Verification of Voting Data 


C . 

two clumns 1-20 represent legislators; & Zero [0] or 

ticulye 2 indicates an error for that man on the par- 
ar vote.) 


Vote 

N 

Umber Legislators 

l unr rirrai irili 

2? fitan rairiiiktitilti] 

o gailirgaiiagsiiriiiyi 

eTUCUrT SES ally heer 
f4iinigvatiitr¢1itat! 


2 The gaz 
48 he, joint-occurrence matrix will be referred to 
Jo-matrix, 


269 


Table 2 


Facsimile of the Joint-Occurrence Matrix 
(Column 1 and row 1 identify the legislator from 
district one, etc. The diagonal is constant, showing 
the number of times each man voted. The other cells 
indicate the number of times each man has voted with 
every other man. The symmetry of the matrix indi- 
cates that the data are correctly tabulated.) 


Legislators 


Legislators 1 2 3 4 5 
1 167 95 88 74 137 
2 95 167 130 120 110 
3 88 130 167 105 97 
4 74 120 105 167 85 
5 137 110 97 85 167 


task is to instruct the machine to record the 
number of times every man votes (has a 1 
in his column) with each other man. 

For both machines the cards must be 
sorted one column at a time. All cards which 
have a 1 in the sorted column are fed into 
the machine for tabulation of the jo-matrix. 
The matrix must be symmetrical. See Table 
2. This is the check on the tabulation. 
Cards may be summary punched with the 
same totals that appear on the printed forms. 
These summary cards may be useful for fur- 
ther matrix manipulation or for larger sum- 
maries of the data, if these are part of the 
experimental design. 

The 402 machine allows us to compare as 
many as 12 columns at a time. Each time a 
run is made, the control wire must be moved 
to the column on which the cards have been 
sorted. If n> 12, the wiring must be 
changed to pick up from the next set of 12 
columns, 13-24, etc. We then begin sorting 
with column 1 and again run the entire 
gamut. Machines normally emit an impulse 
when two readings are unequal. The prob- 
lem of wiring is to allow an impulse to be 
freed when two readings are equal, i.e., when 
there is a 7 in the sorted column and in any 
of the other columns being compared. This 
is accomplished by wiring from the compar- 
ing exit to the pilot selectors’ digit pickup. 
The machine is wired for addition, minor pro- 
g, and printing of totals. If a per- 


gramming 
centage matrix is desired, the reciprocal of 


270 


the total number of votes is emitted into a 
counter entry. The number of significant 
digits required for accuracy must be borne in 
mind in computing this reciprocal. 

The 101 machine compares as many as 60 
columns at one time. The deck is sorted on 
one column and all cards with a 7 punch are 
run through the machine. The machine will 
print the total jo’s for 60 men with no wiring 
changes necessary. The essential wiring for 
the 101 machine provides that a 7 from the 
digit emitters be fed into the recode pickup, 
which has been wired so that the impulse 
passes from column to column. The recode 
selectors are also wired together. A wire runs 
from the count to to the recode selectors, and 
then from the unit counters to count return, 
At least one subtraction plug must be wired. 
The sort select switch is set at the 2 position. 
If the legislature exceeds 60 men, it is profit- 
able to make enough decks to account for all 
of the men without rewiring the 101 machine. 
If there were 90 men, deck A could list men 
from districts 1-30 and 31-60; deck B from 
districts 1-30 and 61-90 (in columns 31-60); 
and deck C districts 31-60 (in the first 30 
columns) and 61-90 (in the second set of 30 
columns). The printed matrix is then spliced 
together. This method avoids the necessity 
for wiring changes and may also be employed 
with the 402 machine. 

Since each investigator has his special prob- 
lems of design, he will interpret these meth- 
ods to suit himself. The 101 machine is the 
better one for even the smallest matrices. 
Fewer wiring problems are encountered, less 
time consumed, and the report is more readily 
checked. On the other hand, the 402 ma- 
chine is more readily available at present in- 
stallations. 


Discussion 


The application of this method to psycho- 
logical research may be made more explicit. 
This method may be applied to any dichoto- 
mous data. The vote is an excellent example. 
Sociometric choices provide a further major 
field of application. 

Voting analyses and sociometrics have been 
criticized for failing to report reliabilities. 
We often study a handful of votes or adminis- 


Harry A. Grace 


ter one sociometric and hope to describe or 
predict behavior. This tabulation technique 
makes possible the study of large numbers of 
votes and sociometric runs. Time or content 
matrices may be compared with their counter- 
parts representing other time or content sam 
ples. To the degree that the jo’s are similar, 
we may speak of the S’s as being consistent 
and/or our measures as reliable. The appli- 
cation of this method to one legislature has 
been reported in the literature (12). We 
have since applied it to eight others. We 
found that behavior is significantly more 
Consistent from issue to issue than from time 
to time. The research possibilities and prac- 
tical applications are as broad as the in- 
genuity of the investigator. 

We have alluded to the fruitfulness of hav- 
ing the data in matrix form. The reason for 
this is the development of matrix algebra and 
its application in factor analysis. As these 
Statistical techniques become more refined, 
matrix data will assume greater importance: 
A few other possibilities are latent structure 
analysis (factor analysis applied to joint pro 
Portion) (13, 14), the difference method (19); 
matrix squaring or cubing (8), and the appli- 
cation of information theory (11). In addi- 
tion to these reified techniques, clusters may 
be arbitrarily selected from the matrix with- 
out such refinement (3; 17, 21). 

Finally, a major value of this method for 
applied studies is the speed with which the 
data may be tabulated, A legislature’s votes 
on any day may be coded, punched, checked; 
and the jo-matrix tabulated overnight. Thus 
a daily record may be kept of voting blocs: 
Weekly, monthly, or yearly summaries may 
be assembled, Matrices may also be tabu- 
lated according to special-interest legislation. 
In this manner, a legislator, citizenship C0™- 
mittee, civic interest group, or social scientist 
could have at hand a daily, topic summary ° 
the policy-body’s voting patterns. A precis? 


account of a group’s sociometric development 
could similarly be made. 


Summary 


_A method for the quantitative treatment of 
dichotomous data is reported. This IBA 
method proceeds quickly from written a 


d 


ords to matrix form. 


Facilitating Legislative Research 


Cards are mark-sensed 


with the data and punched and checked by 


machine, 


A matrix of joint-occurrences is 


pleat by either of two IBM machines. 
ere matrix has many practical applications. 
-he method facilitates rapid, accurate analy- 
Sis of political bodies, sociometrics, and other 
Social data in dichotomous form. 


Received September 5, 1953. 


x 


References 


«Ash, P, The “liberalism” of Congressmen vot- 


ing for and against the Taft-Hartley Act. J. 
appl. Psychol., 1948, 32, 636-640. 


- Bales, R, F, Interaction process analysis. Cam- 


bridge, Mass.: Addison-Wesley, 1950. 


- Beyle, H, C. The analysis of attribute-cluster- 


blocs, Chicago: Univ. of Chicago Press, 1931. 


- Brimhall, D, R. and Otis, A. S. Consistency of 


voting behavior by our congressmen. J. appl. 
E sychol., 1948, 32, 1-14. 
son, H. B, and Harrell, T.W. Voting groups 
among leading congressmen obtained by means 
of the inverted factor technique. J. soc. Psy- 
eee 1942, 16, 51-61. 
en, J. B. Note on Carlson and Harrell’s 
Factor analysis of voting among Congressmen. 
J. soc. Psychol., 1944, 20, 313-314. 


LE z 
berhart, J, C. Determinants of legislative be- 


havior in the U. S. House of Representatives. 
Psychol. Bull., 1942, 39, 595. 

estinger, L. The analysis of sociograms using 
aak algebra. Human Relat., 1949, 2, 153- 
38, 


9. 


10. 


Ti: 


a 


18. 


19. 


. Harris, C. W. 


. Tryon, R. C. Comparative cluster analysis. 


271 


Fletcher, Mona. The use of mechanical equip- 
ment in legislative research. Ann. Amer. 
Acad. polit. soc. Sci., 1938, 195, 168-175. 

Gage, N. L. and Shimberg, B. Measuring sena- 
torial “progressivism.” J. abnorm. soc. Psy- 
chol., 1949, 44, 112-117. 

Garner, W. R. and Hake, H. W. The amount 
of information in absolute judgments. Psy- 
chol. Rev., 1951, 58, 446-459. 

Grace, H. A. A quantitative case study in policy 
science. J. soc. Psychol., in press. 


. Green, B. F., Jr. A general solution for the 


latent class model of latent structure analysis. 
Psychometrika, 1951, 16, 151-166. 


. Green, B. F., Jr. Latent structure analysis and 


its relation to factor analysis. J. Amer. Sta- 
tist. Ass., 1952, 47, 71-76. 
A factor analysis of selected sen- 
ate roll calls, 80th Congress. Educ. psychol. 
Measmt, 1948, 8, 582-591. 


. Keefe, W. J. Party government and lawmaking 


in Ilinois General Assembly. Northwestern 
Univ. Law Rev., 1952, 47, 55-71. 


. Klingberg, F. L. Studies in measurement of the 


relations among sovereign states. Psycho- 
metrika, 1941, 6, 335-352. 

Lowell, A. L. The influence of party upon 
legislation in England and America. Ann. 
Rep. Amer. Hist. Ass., 1901, 1, 321-542. 

Osgood, C. E. and Suci, G. J. A measure of re- 
lation determined by both mean difference and 
profile information. Psychol. Bull., 1952, 49, 


251-262. 


. Rice, S. A. Quantitative methods in politics. 


New York: Knopf, 1928. 
Psy- 


chol. Bull., 1939, 36, 645-646. 


. Turner, J. Party and constituency: pressures on 


Congress. Johns Hopkins Univer. Stud, in 
hist. polit. Sci, 1951, 69, No. 1. 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 4, 1954 


A Comparison of Two Methods of Measuring the Attention- 
Drawing Power of Magazine Advertisements 


Joseph Tiffin and Darvin M. Winick 
Division of Applied Psychology, Purdue University 


In measuring the effectiveness of advertise- 
ments it is of primary importance to be able 
to measure the initial attention-drawing power 
or eye appeal of an advertisement. This fact 
is obvious, since an advertisement which is 
not seen cannot accomplish its intended pur- 
pose. One of the methods which has been 
used to accomplish this measurement of at- 
tention-drawing power is eye movement pho- 
tography. This method produces an objec- 
tive photographic record of the eye move- 
ments of a subject while he is observing one 
or several advertisements. If an experimental 
design permitted pairs of advertisements to 
be presented to the subject, the photographic 
record taken by an eye camera would indi- 
cate which of the two advertisements the sub- 
ject preferred to observe. It is possible, then, 
by totaling the preferences of several sub- 
jects to scale a set of advertisements accord- 
ing to their relative attention-drawing power. 

Although the eye movement photographic 
method produces an objective record of the 
subject’s preferences, the method has several 
disadvantages: 

1. The advertisements must be presented 
to one subject at a time and the presentation 
becomes relatively time consuming if large 
numbers of subjects are to be used. 

2. The transportation and assembly of the 
necessary equipment is cumbersome. 

3. The necessary task of frame by frame 
reading of the film record is laborious and 
time consuming. 

It was the purpose of this study to investi- 
gate the possibility that a less time consum- 
ing method of measuring the attention-draw- 
ing power of advertisements will essentially 
scale advertisements in agreement with the 
scaling produced by eye movement photo- 
graphic methods. Specifically, the investiga- 
tion dealt with the relationship between scal- 
ings produced by a group tachistoscopic 


272 


method and scalings produced by the Purdue 
Eye Camera (2). 


Procedure 


Ten advertisements to be scaled were se- 
lected from current issues of popular weekly 
Magazines. All advertisements were in color 
and full page in size. The subjects for the 
study were 154 students in college psychology 


: jon 
classes, education classes, and adult educati 
classes. 


Tachistoscopic Presentation, For the tachisto- 
Scopic part of the study the ten advertisemen 
Were reproduced on 35 millimeter colored en 
parencies which were individually mounte ia 
1%” X 2” cardboard slide mountings. A spect” 
brass holder for pairs of these mounted zeal 
Parencies was designed to fit into the slide m 
rier of a standard 31” X 4” lantern projector 
It was possible, then, to project together on 
screen any two of the ten advertisement Sli of 
The brass holders could be slipped in and out ri 
the slide carrier in the same manner as standa 
3%” X 4” lantern slides. In order to speed UP 


the presentation two of the brass holders yea 
constructed so thai i 


readied while anot 
projected 


front lens of th 
time the adve d on t 
screen could n this 
study each possible pair of the ten advertise 
ments was pri 
onds. 


f each 


ere asked to indicate on & Phe 
t which advertisement of gec 
ok at if they were given 2 for 
he preferences of each subject 


. i m 
each advertisement were then determined fro 
the answer sheets. ` 


_ Eye Movement 
Jects partici 
tion, 36 were rand 


Ea 


v 


Attention-Drawing Power of Magazine Advertisements 


Table 1 


Mean Number of Preferences for Random Halves of 
the Subjects Using a Group Tachisto- 
scopic Presentation 


Mean Number of Preferences for 
Halves of Subjects 


Alvertises Random Random 

Half A Half B 
1 7.10 7.09 
2 6.54 6.64 
3 5.70 5.22 
4 5.26 5.43 
à 3.82 4.23 
6 3.92 3.93 
4 3.66 3.78 
8 3.21 3.17 
ia 3.13 3.05 
2.65 245 


Sera mirror placed directly in front of the 
camer and an eight millimeter motion picture 
era mounted in front of and above the mirror. 
ned drive is used to keep the camera speed 
is pt at 2.7 frames per second. The subject 
view my in front of the stand and is able to 
mirror e advertisements through the half silvered 
Per part The reflection on the mirror of the up- 
the moti of the subject’s face is photographed by 
It is ion picture camera.' 
a singl possible, by projecting the produced film 
vertidas frame at a time, to identify which ad- 
rame ment the subject is looking at in each 
which A 35 millimeter strip film projector 
millim; had been converted for use with eight 
Proje eter film was available for this single frame 
dine ons A count of the number of frames 
Par ce which the subject’s eyes were fixed on a 
matlar advertisement gives a measure of the 
at mat of time in which the subject was looking 
Spent at advertisement. This amount of time 
re Gea an advertisement was used as one meas- 
Ure of attention-drawing power. Another meas- 
fixatio attention-drawing power, the total first 
Used ns on a particular advertisement, Was ae 
times his was a measure of the number 0 
tst be particular advertisement was looked = 
tion, ” the subjects during the paired presenta- 
of f each subject was to view all possible pairs 
Dresen advertisements, it would be necessary to 
ae 45 pairs and each advertisement woulc 
Depp sented nine times. In the eye camera ex- 
sl 


ko att, however, it was felt that each subject 
ul once. In 


Ord, 
R were pre- 


n be 


S view each advertisement only 
> © accomplish this only five pairs 


A 
foung More detailed description of the camera cal 
in an g3, ên unpublished thesis by Karslake (3), an 
tticle by the same author (2): 


273 


sented to each subject. In this manner nine sub- 
jects were needed to complete each total pairing. 
The 36 subjects used in the eye camera experi- 
ment actually represented four complete pairings 
of the ten advertisements. 


Results 


Reliability. From the results of the ta- 
chistoscopic presentation the preferences of 
all 154 subjects for each advertisement were 
totaled. The subjects were then randomly 
split into two groups and the product mo- 
ment correlation (1) between the total pref- 
erences of these groups was computed and 
used as a measure of the reliability of the 
tachistoscopic method. The split-half cor- 
relation found was .98. When the Spearman- 
Brown formula (1) was applied to find an 
estimate of the expected correlation for dou- 
ble the number of judges, an 7 of .99 was 
found. In Table 1 the mean preferences for 
each half of the subjects are shown. 

In order to investigate the reliability of the 
first eye camera measure, the first looks at 
each advertisement by random halves of the 
subjects were totaled. The 7 between these 
halves was .58. The Spearman-Brown for- 
mula established the 7 for double the num- 
ber of judges to be .73. The mean number of 
first looks for random halves of the subjects 
are shown in Table 2. In the second eye 
camera method the first ten frames of film 
showing the subject looking at a pair of 


Table 2 


Mean Number of First Looks for Random Halves of 
the Subjects Using the Purdue Eye Camera 


Mean Number of First Looks for 
Halves of Subjects 


vertise- Random Random 
Eea Half A Half B 
1 6.50 5.50 
2 6.00 8.00 
3 5.00 6.00 
4 6.50 4.00 
5 2.00 3.50 
6 3.50 2.50 
7 3.50 4.00 
8 4.50 2.50 
9 3.50 4.00 
10 4.00 5.00 


274 


advertisements were considered. The num- 
ber of frames in which the subject looked at 
a particular advertisement was first deter- 
mined, and from this the total number of 
frames in which random halves of the sub- 
jects looked at each advertisement was to- 
taled. Since the eight millimeter camera was 
motor driven and its speed constant at 2.7 
frames per second, it is possible to convert 
the total frames measure into total seconds 
looked at a particular advertisement. In 
Table 3 the mean number of seconds spent 
by random halves of the subjects on each ad- 
vertisement is shown. The r between the two 
halves for this method was found to be .50. 
The Spearman-Brown estimate of reliability 
was .67. 

Comparison of the Different Methods. In 
order to determine the relationship between 
the different methods of measuring the atten- 
tion-drawing power of advertisements, prod- 
uct moment correlations were computed be- 
tween the relative attention values found by 
the tachistoscopic presentation method and 
each of the two eye camera measures. In 
Table 4 the mean number of tachistoscopic 
preferences, first looks, and seconds spent on 
each advertisement are shown for all sub- 
jects. The correlation between the results of 
the tachistoscopic and eye camera, first look, 
methods was found to be .79, A correlation 
of .83 was found between the results of the 


Table 3 


Mean Number of Seconds Spent by Random Halves of 
the Subjects Using the Purdue Eye Camera 


Mean Time in Seconds Spent by 
Halves of Subjects . 


Advertise- Random Random 
ment Half A Half B 
1 21.3 18.7 
p] 21.8 18.3 
3 18.9 17.8 
4 15.0 21.3 
5 13.1 13.0 
6 15.6 17.4 

15.4 13.7 

8 13.3 15.0 
9 16.9 15.6 
10 15.4 15.9 


Joseph Tifin and Darvin M. Winick 


Table 4 


Mean Number of Tachistoscopic Preferences, First 
Looks, and Seconds Spent for All Subjects 


e: Mean Mean 

C Nene: Nomini 

Advertise- of Tach. of First of Seconds 
ment Preferences Looks Spent 
J 7.10 6.00 20.0 
2 6.59 7.00 20.1 
3 5.46 5.50 18.3 
4 5.34 5 18.1 
5 4.03 2.75 13.0 
6 3.93 3.00 16.5 
7 3.72 3.75 14.5 
8 3.19 3.50 14.1 
9 3.09 3.75 16.2 
10 2.55 4.50 15.6 

= Saat 


tachistoscopic preferences and the number- 
of-seconds-spent measure of the eye camera 
presentation, a 
Since the reliability of the two eye Ga 
Measures was lower than the reliability of ei 
tachistoscopic measure, probably due pi 
marily to the small number of complete pait- 
ings, it would be of interest to know what 
the correlations between the tachistoscoP! 
results and each of the two eye camera meas- 
ures would be if the latter were perfectly Te 
liable. These correlations can be estimate 
by correcting for attenuation due to the im 
perfect reliability of the eye camera (one 
terion) measures (1, p, 530). These cor” 
rected correlations were .86 between the ta 
chistoscopic measures and the eye camera, 
first look, measures; and .99? between the 
tachistoscopic results and the eye camera 
“number of seconds spent” measures. 


Summary and Conclusions 


In this investigation it was found that i 
attention-drawing power of advertisemen : 
can be scaled by the paired-compariso" ve 
chistoscopic method with a reliability of 
Correlations of 86 and .99 were found 3 
tween the tachistoscopic method and two €Y 
camera measures. These 7's were the aan: 
of correcting the obtained correlations for L 
unreliability of the eye camera measures: 


For 2 


? Actual arithmetic results in an y of 1.03. gomar 


discussion of r's greater t ity, see McNe 
(4, p. 136). han unity, 


Attention-Drawing Power of Magazine Advertisements 


The relationships indicate that the group ta- 
chistoscopic method as used in this study will 
scale advertisements in essentially the same 
order as eye camera methods when attention- 
drawing power is considered. This fact is 
Important to people interested in advertising 
research for several reasons: 

1. The tachistoscopic method lends itself 
easily to group presentation and enables large 
numbers of subjects to be reached. 

2. A standard, easily transportable, slide 
Projector is the only equipment needed to 
make the tachistoscopic presentation. 

3. Preferences on prepared answer sheets 
May be quickly totaled by hand or machine 
methods. 

In situations where eye movement photog- 
raphy could be used to measure the attention- 

tawing power of advertisements, the results 


275 


of this study indicate that a considerable 
saving of time and energy can be effected by 
use of a group tachistoscopic presentation. 


Received June 19, 1953. 


References 


1. Guilford, J. P. Fundamental statistics in psy- 
chology and education. (2nd Ed.) New 
York: McGraw-Hill, 1950. 

2. Karslake, J. S. The Purdue Eye-Camera: A 
practical apparatus for studying the attention 
value of advertisements. J. appl. Psychol., 
1940, 24, 417-440. 

3. Karslake, J. S. A simple and direct method for 
investigation of the attention value of ad- 
vertising copy through eye movement photog- 
raphy. An unpublished thesis, Purdue Uni- 
versity, 1939. 

. McNemar, Q. Psychological statistics. New York: 
Wiley & Sons, 1949. 


> 


THE JOURNAL or APPLIED PsycHoLocy 
Vol. E No. 4, 1954 


Applied Psychology in Action 


Legal Status of Advertising and Marketing Psychology Experts 


An important U. S. District Court decision 
by Judge Robert C. Bell, District of Minne- 
sota, was handed down on September 4, 1953 
at St. Paul, Minn. The Court admitted the 
testimony of two experts in the field of ad- 
vertising and marketing psychology. These 
experts had been engaged by the U. S. Food 
and Drug Administration to interpret adver- 
tising copy and to determine its impact on a 
sample of 200 prospective purchasers of a 
drug. As a result of the success attained it 
is probable that advertising and marketing 
psychologists will be increasingly used in 
prosecutions involving the fraudulent and 
misleading use of labels and advertisements 
in the marketing of drugs and foods. 

The case revolved around a full page news- 
paper advertisement of “Tryptacin.” The 
label on the drug itself did not contain direc- 
tions for use in the treatment of stomach 
ulcers although this is a condition for which 
the drug is intended and for which the drug 
is suggested and recommended in its adver- 
tising. 

The defense contended that the advertise- 
ment represented only that “Tryptacin” is 
intended for use as an antacid or a palliative 
for acid pain. The defense evidence consisted 
of the testimony of two representatives of a 
firm which handles “Tryptacin” advertising 
and the testimony of a number of physicians. 
The two advertising men testified that, in 
their opinion, the advertisement offered 
“Tryptacin” as a means of relieving acid 
pain and not of curing stomach ulcers. They 
also testified that they had shown the adver- 
tisement to a number of their associates in 
the advertising business, to newspaper censor- 
ship boards, and to other persons and not 
a single person received the impression that 
the advertisement offered a cure for stomach 
ulcers, The physicians who testified for the 
defense stated that they had discussed the 
advertisement with doctors, nurses, patients, 
and other persons and again no one got the 
idea that the product would cure stomach 
ulcers. The Court noted that these witnesses 


did not offer any written evidence concerning 
their interviews. Furthermore, it did not ap- 
pear to the Court that the interviews wer 
systematically conducted. The Court wen 
on to state: “The likelihood of error ch 
prejudice developing in the course of suc 
interviews would seem to be great, particu- 
larly since none of the witnesses of claimant, 
including both advertising men and a 
were qualified by education or experience 
the taking of formal public opinion surveys. 
(Italics added) 2. (0 
Judge Bell based his decision ons t: 
reading and examining the advertisemen.* 
(2) hearing the testimony of two experts 1M 
the field of advertising and marketing il 
chology (Howard P, Longstaff of the 
versity of Minnesota and James N. on 
of George Washington University); (3) the 
testimony of two persons who purchased f 
drug in the belief that the advertisement e 
fered a cure for stomach ulcers; and (4) : e 
testimony of a specialist in internal medion 
who has treated many cases of stomach ye 
and who testified that in his opinion the ea 
patient would get the impression from t a 
advertisement that the drug was offered as 
cure for stomach ulcers. f 
The Court commented on the testimony ~ 
Longstaff and Mosel as follows: “they T 
sented exhaustive analyses of the content i 
the advertisement and the effect which, e 
was intended to haye upon the prospectiv? 
purchaser of the drug. Such testimony is 8% 
missible to determine the meaning of an nd 
vertisement. Federal Trade Commission 40 
National Health Aids, Inc., 108 F. Supp: 3 
(D. Ma.), syne 
“Moreover, Dr, Mosel introduced eviden® 
relative to two hundred individuals whom ich 
surveyed Concerning the impression er 
they received from the ‘Tryptacin’ Ee 
ment. A substantial portion of those go 
viewed indicated that they received the y 
pression from the advertisement that DIr 
tacin’ would ‘stop,’ ‘cure’ or otherwise bt 


276 


— 


= 


Sa 
3 E 


> 


Applied Psychology in Action 


aon some permanent relief of ulcers. The 
ae filled out by the individuals questioned, 
eee cards, and tabulations made by Dr. 
Mosel of the answ i í 
E swers received, were placed 
Ms Court thereupon upheld the seizure of 
cases, more or less, of the drug and 


pi; 


assessed the costs of the judicial proceedings 
against the defense—Source: Letter dated 
September 23, 1953 from Division of Regu- 
lator Management of the Food and Drug Ad- 
ministration together with enclosures consist- 
ing of Findings of Fact and Conclusions of 
Law and Memorandum Opinion. 


—— 


THe Joi 
E JOURNA P 
Vol, 38, Ne ee PsycHoLocy 


Reporting Employment Test Scores to Supervisors * 


Clifford E. Jurgensen 


Ass’t Vice President—Personnel, Minneapolis Gas Company 


eee of the persistent problems in the field 
emplo ustrial Psychology is that of reporting 
in Se ey test scores to persons untrained 
person eld of tests and measurements. Such 
and <a include supervisors, top management, 
self yaaa perhaps, the applicant him- 
scores he simple enough to advise that test 
in eae ould not be given persons untrained 
owevér interpretation. In actual practice, 
er, such advice must often be ignored. 
dd hee in test interpretation can and 
ever £ given insofar as is possible. How- 
Dii such training cannot possibly reach all 
“tit ae involved, Further, it is unlikely that 
tetel can be sufficiently intensive and ex- 
inyo] e to train adequately any of the persons 
Pre Therefore, it is necessary and de- 
e e to simplify test score interpretation to 
greatest possible extent. 
eae procedure discussed here consists of a 
blotted chart on which percentile scores are 
show Oy a linear continuum. The chart, 
ability in Figure 1, is based on normal prob- 
blotted tables in which percentile ranks are 
m accordance with z-score units. 
ieser units effectively overcome the difficulty 
Neate, ot by the fact that the difference be- 
equiv. ES 90th and 99th percentiles 1S not 
anq rr ent to the difference between the 40th 
9th percentiles. 


* 
Thi ‘ 
his material contains the gist of a part of the 


Drese c 
Philo ot by Jurgensen in a panel discussion 0” 
catio; ophy of Testing” before the Minneapolis Vo- 


n : 3 
al Guidance Association on April 29, 1954. 


Although carefully designed experiments 
with adequate controls are lacking, experience 
indicates that lay people tend automatically 
to make reasonably correct interpretation of 
scores inasmuch as they are likely to inter- 
pret scores on the basis of where the X *s ap- 
pear on the profile. For example, it is not 
uncommon to hear remarks such as “His 
score on mechanical reasoning is about half 
way between his highest and lowest scores.” 
Such remarks are based on profile plotting 
(and therefore g-scores) and do not corre- 
spond to an average percentile rank. 

Although it has been found that lay peo- 
ple typically interpret scores graphically, and 
therefore linearly insofar as standard scores 
are concerned, the profile does give two verbal 
interpretations to facilitate communication or 
record purposes. One of these is the well 
known percentile rank which is labeled on the 
profile as “per cent of group having lower 
score.” The other is a general verbal inter- 
pretation of the score in terms of commonly 
used adjectives. A column headed “Test or 
Measurement” is used to give the type of test 
in functional terms rather than the name of 
the specific test. A column headed “Basis of 
Comparison” is used to give the norm group 
on which test scores are profiled. 

The profile chart mentioned above is a 
simplification of a similar chart used within 
the Personnel Department with persons 
trained in test interpretation. This original 
chart permitted interpretation on four, rather 


278 


Test Profile of 


Applied Psychology in Action 


Dat) = 


General 


Very Low 


Low 


= 


Interpretation of 


Score 


Average High Very High 


Test Profile of 


es 
Group 


Very Low | 


P Basis of 
Test or Measurement Percent of Group Having Lower Score Comparison 
1 10 20 30 40 50 60 70 80 90 99 
} i lta [i 1 I. 1 ! PS Fe i DON PPRT ral Pi 1 
| |_| Lf | 
SS m 7 
i i ti a | 
| LI | 
j | , a 
| wa E 
e 
a A 


—_——_ 


Average 


Stanina 1 2 


30 
srirtis 


Stondard Score 35 40 45 


petiriitiies | 


Percentile Rank 10 20 20 


40 
Juul 


ee 


Fic. 1. Portions of Test Profiles. Te: 
supervisors, applicants, etc. 
partment, 


than two bases. These are: group descrip- 


tion, stanine, standard score, and percentile 
rank. Instead of a single column labeled 
Basis of Comparison,” the original chart 
contained four columns. These consisted of 
raw score, transmuted score (percentile rank 
standard score, stanine, or other such score), 
norm group, and a fourth column could be 
used for any additional data desired. This 
original, and more complicated, chart con- 


tains the same advantages as the simplified 
version insofar as interpretation based on 10- 
cation of X, owever, although terms SUC 
as stanine and standard score do not affec 
Score interpretation, lay people feel uneasy 
about a chart which they do not fully com 
Prehend. The simplified profile has there 
fore been found to contain all of the ad- 
vantages without containing the disadvan- 
tages of the original chart, 


Book Reviews 


Marketing and Social Research Division of 
the Psychological Corporation. The meas- 
ured effectiveness of employce publications. 
New York: Association of National Adver- 
tisers, Inc., 1953. Pp. 109. $10.00. 

ee . a well designed study, the results 

se ‘ni are reported in a beautiful, litho- 
ae brochure replete with illustrations, 
dias e tables of results expressed as percent- 

‘ie a minimum of verbiage, and a maxi- 

= E GE white spaced margins. The overall 

Abe ne is 14 inches by 11 inches. Presum- 

aa E is the kind of expensive and ex- 

es ooking report that consultants and 

Dy to i organizations believe will be read 

a ate rass in business and industry. It is 

ide ~~ ed contrast to the form of report 

ae by scholars in reporting the detailed 
ely quantitative studies in the scientific 
this s and monographs. The very form of 

Reus in this reviewer's opinion, poses a 

fan. psychological problem: do high level 

ing ‘kes really prefer this type of advertis- 

“bus yout report? Is the stereotype of the 

eee correct which assumes that 

tion se poe before him for his serious atten- 

So th a study must be presented in a form 
B at “he who runs will read”? 

a let's get on with a review of the con- 

it is of this report. As the sub-title states, 

read a study of readership, penetration, and 
adability of seven employee publications. 
ie Sponsor was the ANA Public Relations 
aie and the study itself was made by 
sychological Corporation with Charles 

Wiis aughn serving as technical director. It 

iA at finding out what employees will 

and believe. 

the s erord, an introduction de 

area Ole of employee publications 1 

tives of business communications, 

ae the study, the methods use 
ults igation, and a general summary of re- 
are then followed by a pictorial, tabular, 

m ebal description of the detailed results. 
bu ancra, the results show the employee 

Sour ation to be one of the best of available 

Ces of information about the company 


ealing with 
n the total 
the objec- 
d in the 


better than such sources as the first-line su- 
pervisor, the union steward, and meetings. 
‘An incredible 97 per cent of the 1,800 in- 
plant interviews indicate belief in what they 
read in the company magazine or newspaper. 
Readership was likewise quite high, namely 
90 per cent reported they had read at least 
one of the two most recent issues and, on the 
average, 78 per cent reported that they read 
the publications regularly. Thoroughness of 
readership, however, was much less. 

The industrial psychologist and the ap- 
plied social psychologist will be especially in- 
terested in the reported relationships between 
“leftist” and “rightist” attitudes of em- 
ployees interviewed and the extent of their 
readership and in the Flesch readability 
scores of these seven publications. In regard 
to the latter, as is usual, the publications are 
written at a level of difficulty that is too high 
for over one-third of the rank-and-file em- 
ployees. Of more importance, little relation- 
ship between readership and readability was 
found. This finding, however, may be re- 
garded as throwing doubt on the method of 
measuring readership which was used rather 
than as evidence tending to discredit the im- 
portance of simplified language in reaching 
employees with limited amounts of education. 

The reviewer has little to criticize with re- 
what is actually presented in this 
Furthermore, there are many com- 
mendable features such as the frankness of 
the plant-by-plant comparisons, the wealth of 
pictorial illustrations of “good” and “poor” 
features of these company publications, and 


the reproduction, in the appendix, of the in- 
terview schedule used to measure readership 
that high-level 


and attitudes. It is obvious 
professional competence is reflected. How- 
ever, one misses any reference to other rele- 


vant studies to which the scholarly business 
for further informa- 


executives could turn 
tion if he so desired. The reviewer suspects 


there are many more studious business ex- 
ecutives than the advertising fraternity re- 
sponsible for the form of the present report 


spect to 
report. 


279 


280 


would believe possible. Finally, it would 
have been of value to the serious student of 
industrial communications to give the sta- 
tistical constants such as means and stand- 
ard deviations, and coefficients of correlation 
in an appendix so that findings in the present 
study might be compared with reports of 
similar scientific studies. 


Donald G. Paterson 
The University of Minnesota 


Comment on Preceding Review 


Paterson’s remarks in regard to the elabo- 
rateness of the presentation accentuate the 
rather interesting differences in frames of ref- 
erence. After considerable discussion, we de- 
cided to make the publication simple because 
we felt that executives were tiring of the 
glossy four-color jobs! Actually, only the 
general summary and Preceding page were 
thought to be of interest to top m 
and these two pages will probably be printed 
separately for that group. 

We were rather concerned with the low and 
occasionally negative correlations between 
Flesch scores and readership. The explana- 


anagement, 


What happe 
and important 


teadably. “Dravo Bids,” 
trated in our Publication, actually tells the 
workers indirectly whether they are going to 
have jobs or not, yet it abounds with long 
words and big figures. I can verify from in- 
tensive interviewing of my own that even the 
poorly educated “sand hogs” literally pore 
through it. There are other similar features, 

The travesty is that compelling material 
written at a very difficult level may lead the 


inept reader to some rather 


bizarre conclu- 
sions indeed, 


Charles L. Vaughn 
The Psychological Corporation 


Book Reviews 


Anon. Army personnel tests and measure- 
ments. TM12-260, Department of the 
Army. Washington, D. C.: U. S. Govern- 
ment Printing Office, 1953. Pp. 125. $.55- 
This is a good little summary of the use of 

tests and rating procedures in the Army. It 

reads much like a standard text on employ- 
ment psychology condensed and written down 
to the level of readers without a psychology 
background. For psychologists in the ee 
it might serve as a useful refresher and al- 

Most approximates a manual. For oiha 

Army personnel needing some familiarity wit 

the field, it would be very helpful if rea 

carefully and, preferably, with an elementa 
statistics text on the side. The monograp 

covers test construction, criteria, scoring 2 

tests (especially standard scores), reliability 

and validity, the use of profiles in classifica- 
tion, achievement tests, self-description an 
rating scales (including forced choice), tes! 
administration and scoring. le 

The work has a number of commendab 
features. It is concise and there is not i 
word wasted. Effective use is made kx 
graphic materials—some of them quite m 
genious. There is interesting adaptation Bi 
military terminology to conventional psycho 
logical presentation. For instance, reliability 
and validity are interpreted in terms of “Ca! 
culated risks.” The treatment is down tO 
earth and Practical, but entirely scientific 
withal. 

There is always the problem of how to 
handle statistics in a work like this. The 
Present authors employ conventional statisti- 
cal terminology, but do not indicate how any- 
thing is computed. There is a frequent sug- 
gestion that “any statistics book” bov 
some particular item. The authors do abou 
as well as could be done under the circum- 
stances with brief explanations of some sta- 
tistical notions and graphic materials Re 
clarify the explanation. According to an md 
sert the major responsibility of the work a 
pears to have been carried by Baier, Bayt’’ 
and Rundquist, They are to be congrattr 
lated on having done an interesting and us? 
ful minor piece of work. 


Harold E. Burtt 
The Ohio State University 


Book Reviews 


Buros, Oscar K., editor. The fourth mental 
measurements yearbook. Highland Park, 
N. J.: The Gryphon Press, 1953. Pp. xxiv 
+ 1163. $18.00. 

gue reviews of earlier editions of the 

with al Measurements Yearbook have begun 

ai colares, The reviewer of this latest 

Neon sees np reason to deviate from this 

sae Buros Fourth Mental Measurements 

fics is a monumental work, even longer 
value e previous edition and of inestimable 
ee to purveyors and users of information 

Sed aca The (825-page) section ‘Tests 

gies rig lists 793 tests, 596 original re- 

review y 308 reviewers, 53 excerpts from test 

Pi he, in 15 journals, and 4,417 references 

a construction, validation, use, and limi- 

section of specific tests. . . . The (267-page) 

On i Books and Reviews’ lists 429 books 

58 easurement and closely related fields and 

ale excerpts from book reviews in 121 jour- 

än "i The series of detailed indexes remains 
Paria feature of the volume. 

tesia jective tests, aptitude test batteries, and 

ably or specific vocations all receive notice- 

than ee attention in the present volume 
tive ie the Third Yearbook. Some 19 projec- 
Š sts are mentioned for the first time in 

SE series and 631 new journal ref- 

otal S on the Rorschach (one-seventh of the 

fin number of journal references on tests) 
imps the total in the yearbook series to an 
oe 1,217. The one page devoted to 
earb aptitude test batteries in the Third 
hine ook has become 37 pages devoted to 
the rae batteries in the current work. That 
ee test battery is a relatively recent 
àt oe is made clear by the post-World 
I dates on seven of the nine batteries. 

at 45 tests for specific vocations are listed 


281 


in the current yearbook, as opposed to 10 in 
the Third Yearbook, is partly the result of the 
only recently won permission to review sev- 
eral of these tests, partly a reflection of the 
continued efforts of professional schools to 
improve selection procedures. 

Past reviewers have argued for changes in 
editorial policy, notably for the exclusion of 
tests which do not meet certain predeter- 
mined criteria. The present reviewer chooses 
to concern himself with only one aspect of 
editorial policy: the exclusion of tests thor- 
oughly reviewed -in previous yearbooks for 
which there has been no new edition since 
the last yearbook. 

Unless it can be assumed that all yearbook 
users know they must also consult previous 
editions, they may not become aware of the 
existence of some established tests. At least 
a half dozen of the best known, most used 
(and frequently most carefully studied) tests 
of manual dexterity are not mentioned in the 
Fourth Yearbook, nor is the well-known 
Minnesota Clerical Test. Current yearbooks 
should at least list tests previously reviewed 
with a cross reference to the appropriate vol- 
ume, Exclusion criteria might be developed 
so that such lists would not be cluttered with 
the measurement whims of the century. 

Added features require space, and space 
has always been a problem for Buros. The 
“Books and Reviews” section appears to offer 
less that is new and to serve a more limited 
readership. To the extent that this section is 
a drain on the “Tests and Reviews” section, 
it is here that space economies should be 


effected. 
Charles N. Morris 


Teachers College, Columbia University 


New Books, Monographs, and Pamphlets 


m on, 
Books, monographs, and pamphlets for listing and possible review should be sent to Donald G. Paters 


Editor, Department of Psychology, University 


Problems of consciousness. Harold A. Abram- 
son, Editor. New York: The Josiah Macy, 
Jr. Foundation, 1954. Pp. 177. $3.25. 

Rorschach responses in the aged. Louise 
Bates Ames, Janet Learned, Ruth W. 
Metraux, and Richard N. Walker. New 
York: Paul B. Hoeber, Inc., Medical Book 
Department of Harper & Brothers, 1954, 
Pp. 244. $6.75. 

Psychological testing. Anne Anastasi. New 
York: The Macmillan Company, 1954. 
Pp. 240. $4.25. 

The exteriorization of the mental body. 
James Baker, Jr. New York: The Wil- 
liam-Frederick Press, 1954. Pp. 32. $1.50. 

Psychology of personnel in business and in- 
dustry. Second Edition. Roger M. Bel- 
lows. New York: Prentice-Hall, Inc., 
1954. Pp. 467. $7.35. 

Employment psychology: the 
Roger M. Bellows and M. Fran 
New York: Rinehart & 


interview, 
ces Estep. 
Company, Inc., 


1954. Pp. 295. $4.25. 

After high school—what? Ralph F. Berdie. 
Minneapolis: University of Minnesota 
Press, 1954. Pp. 240. $4.25. 

Columbia mental maturity scale. Bessie B. 


Burgemeister, Lucille Hollander 
Irving Lorge. Yonkers-on-Hudson, N. Y.: 
World Book Company, 1954, Examiner’s 
Kit: 100 items, and a comprehensive 
Manual. $35.00. Individual Record 


Blanks are priced at 3.85 per package of 
35. 


The sociology of work. Theodore Caplow. 
Minneapolis: University of Minnesota 
Press, 1954. Pp. 330. $5.00. 


Manual of child psychology. Second Edi- 


Blum, and 


tion. Leonard Carmichael, Editor, New 
York: John Wiley & Sons, Inc., 1954, Pp. 
1,295. $12.00, 

Sociology perspective. Ely Chinoy. New 


York: Doubleday a 
1954. Pp. 58. $.85, 

Introduction to logic. Irving M. Copi. New 
York: The Macmillan Company, 1953. 
Pp. 472. $4.00. 


nd Company, Inc., 


282 


of Minnesota, Minneapolis 14, Minnesota. 


Symbolic logic. Irving M. Copi. New si 
The Macmillan Company, 1953. Pp. 472. 
$5.00. : 

Religion and human behavior. Simon Doni- 
ger, Editor. New York: Association Press: 
1954. Pp. 233. $3.00. P3 

Production guides and controls for the ue 
ern executive. M.' J. Dooher, Editor. 
New York: American Management ASSO 
ciation, 1953. Pp. 52. $1.25. f 

Stepping up ofice efficiency. M. J. Dooher, 
Editor. New York: American Manage 
ment Association, 1953. Pp. 46. ee 

Streamlining office equipment and ome’ 
M. J. Dooher, Editor. New York: “a 
can Management Association, 1953. +P- 
35. $1.25. J 

Gearing up for better production. M. e 
Dooher, Editor. New York: America 
Management Association, 1953. Pp- 
$1.25. 


The human side of the office manager's job. 
M. J. Dooher, Editor, New York: Ameri 
can Management Association, 1953. Pp. 
40. $1.25. 

A critical look at the insurance buyers a 
M. J. Dooher, Editor. New York: Ameri 
can Management Association, 1953. PP- 
35. $1.25, ; 

Maintaining a dynamic insurance progrer.. 
M.J. Dooher, Editor. New York: Amer 


can Management Association, 1953. re; 
44, $1.25, 


Industry at the bargaining table. M. n 
Dooher, Editor. New York: Americ 
Management Association, 1954. PP- 
$1.25, ls 

Selling costs and market potential: contr? 


wW 
and guides, Ne 


M. J. Dooher, Editor. 
York: Ameri 


jation, 
can Management Associa 


1954, Pp. 38 $1.25 
` $1.25. s 
Modern learning theory. William K. a 
Sigmund Koch, Kenneth MacCorquoc’ 


1 
paul E. Meehl, Conrad G. Mueller, Wir 


liam N. Schoenfeld, and William S- 
planck. New York: Appleton-Cen 
Crofts, Inc.; 1954. Pp. 424. 


tury“ 


s a 


> 


—y 


New Books, Monographs, and Pamphlets 


Mind and performance. Harold Kenneth 
Fink. New York: Vantage Press, 1954. 
Pp. 113. $3.00. i 

Human behavior in industry. William W. 
Finlay, A. Q. Sartain, and Willis M. Tate. 
New York: McGraw-Hill Book Company, 

3 Inc., 1954. Pp. 247. $4.00. 

A psychological glossary. D. C. Fraser. 
Cambridge, England: W. Heffer & Sons, 

Pome 1954. Pp. 40. 3s. 6d. net. 
ethods of research. Carter V. Good and 
re E. Scates. New York: Appleton- 

a ere. Inc., 1954. Pp. 896. $5.50. 
ns ife and ideas of the Marquis De Sade. 
ey Gorer. New York: The British 

a Centre, Inc., 1954. Pp. 244. $3.50. 
T psychology. Fourth Edition. Arthur 
po. New York: Prentice-Hall, Inc., 

The 4. Pp. 676. $6.00. 

practice of psychotherapy. C. G. Jung. 
New York: Bollingen Series, 140 East 62nd 
Street, 1954, Pp. 377. $4.50. 
now your reader. George R. Klare and 
Byron Buck. New York: Hermitage 

mha uss; Inc., 1954. Pp. 192. $2.95. 

4 technique of handling people. Revised 
Edition. Donald A. and Eleanor C. Laird. 
may York: McGraw-Hill Book Company, 

Te 1954. Pp. 189. $3.75. 
wards an understanding of juvenile delin- 
quency, Bernard Lander. New York: 
tie University Press, 1954. Pp. 143. 

Your child and his art. Viktor Lowenfeld. 
a York: Macmillan Company, 1954. 

BP, 186. $6.50. 
= down the walls. John Bartlow Mar- 
New York: Ballantine Books, 1954. 
P. 310. Paperbound edition $.50- Hard- 

4 ound edition $3.50. 
ew approach to office management: 
Srated data processing through common 
nguage machines. Elizabeth Marting, 
sac New York: American Manage- 
babe, Association, 1954. Pp. 62. i $2.50. 
A s S Padre. Emmett McLoughlin. Bos- 

n: Beacon Press, 1954. Pp. 288. $3.95. 
Ne to enjoy yourself. Albert A. Ostrow- 

W York: E. P. Dutton & Co., Inc., 1954. 
P. 259. $2.95. 


inte- 


283 


Psychology. William J. Pitt and Jacob A. 
Goldberg. New York: McGraw-Hill Book 
Company, Inc., 1954. Pp. 414. $4.50. 

Psychology and life. Fourth Edition. Floyd 
L. Ruch. New York: Scott Foresman and 
Company, 1954. Pp. 496. $5.00. 

Letters to my daughter. Dagobert D. Runes. 
New York: Philosophical Library, 1954. 
Pp. 131. $2.50. 

Principles of industrial psychology. Thomas 
Arthur Ryan and Patricia Cain Smith. 
New York: Ronald Press Company, 1954. 
Pp. 534. $5.50. 

Selected writings of De Sade. Leonard de 
Saint-Yves. New York: The British Book 
Centre, Inc., 1954. Pp. 306. $6.75. 

Case studies in management development: 
theory and practice in ten selected com- 
panies. Robert G. Simpson. New York: 
American Management Association, 1953. 
Pp. 140. $2.50. 

A survey of management development: the 
quantitative aspects. Joseph M. Trickett. 
New York: American Management Asso- 
ciation, 1953. Pp. 64. $1.25. 

Management education in American business. 
Lyndall F. Urwick. New York: American 
Management Association, 1953. Pp. 136. 
$1.50. 

An annotated bibliography of word associa- 
tion references important to marketing re- 
searchers. James M. Vicary. New York: 
James M. Vicary Company, 20 East 60th 
Street. Pp. 5. Gratis. 

The education of employees: a status report. 
Douglas Williams and Stanley Peterfreund. 
New York: American Management Asso- 
ciation, 1953. Pp. 65. $1.25. À 

Personality through perception: an experi- 
mental and clinical study. H. A. Witkin, 
H. B. Lewis, M. Hertzman, K. Machover, 
P. Bretnall Meissner, and S. Wapner. New 
York: Harper & Brothers, 1954. Pp. 571. 
$7.50. 

Audio-visual materials: their nature and use. 
Walter Arno Wittich and Charles F. Schul- 
ler. New York: Harper & Brothers, 1953. 
Pp. 564. $6.00. 

Psychology in the nursery school. Nelly 
Wolffheim. New York: Philosophical Li- 
brary, 1953. Pp. 144. $3.75. 


284 


Journal of counseling psychology. C. Gilbert 
Wrenn, Editor. Business Office: Room z, 
Old Armory, Ohio State University, Co- 
lumbus 10, Ohio. $6.00 per year. $1.75 
per issue. Issued bi-monthly. 

Reading rapidly and well. Revised Edition. 
C. Gilbert Wrenn and Luella Cole. Stan- 
ford, Calif.: Stanford University Press, 
1954. Pp. 16. $.15. 

The language of dynamic psychology. Jo- 
seph W. Wulfeck and Edward M. Bennett. 
New York: McGraw-Hill Book Company, 
Inc., 1954. Pp. 111. $4.00. 

Administration and the teacher. William A. 
Yeager. New York: Harper & Brothers, 
1954. Pp. 577. $4.50. 

The pre-adolescent exceptional child. Child 
Research Clinic of the Woods Schools. 


New Books, Monographs, and Pamphlets 


Langhorne, Pa.: The Child Research Clinic 


of the Woods Schools, 1953. Pp. 70. 
Gratis. 
This we believe about education. Educa- 


tional Advisory Committee and Council. 
New York: National Association of Manu- 
facturers. Pp. 32. 

Studies in schizophrenia. Tulane Depart- 
ment of Psychiatry and Neurology. Cam- 
bridge, Mass.: Published for the Common- 
wealth Fund by the Harvard University 
Press, 1954. Pp. 619. $8.50. 

Statistics of public secondary day schools, 
1951-1952. U. S. Department of Health, 
Education, and Welfare. Washington 25; 
D. C.: Superintendent of Documents, U. 


Government Printing Office, 1954. Pp. 81- 
3.35. 


Me 


Journal of Applied Psychology 


Vor. 38, No. 5 


Studies in Industrial Empathy: ITI. 


OCTOBER, 1954 


A Study of Supervisory 


Empathy in the Textile Industry * 


Wendell M. Patton, Jr. 


Bruce Payne & Associates, Inc. 


Summary 


sp one, increasing difficulties and growing Te- 
Particu a of the position of supervisor, 
Suggest he in the realm of human relations 
abilit e need for the investigation of the 
thigh of _Supervisors to understand both 
in mi eee and subordinates. With this 
pat er » a study was undertaken of the em- 
Erlent 4 ability of these supervisors and the 
ot to which this ability was related to 
er psychological variables. 
manui were obtained from a large textile 
miten E plant producing prints and 
ased als from spun rayon. The results are 
front-line the replies of 54 secondhands or 
Mana, ine supervisors, 18 members of top 
out «ean and a random sample of 243 
o! 2,496 employees. 
ot A. found that the secondhands were 
r eee athizing effectively with either labor 
ing anagement, Instead they were project- 
ee, valy toward labor and negatively 
Bap ma angenent, A social-psychological 
Which wee between labor and management 
Ceive the supervisors were unable to per- 
© Intelligence, education, and scores 
ely. test, How Supervise? (5) were posi- 
abilit related to the supervisors’ empathetic 
ang Y for both labor and management; age 
ela pa viny experience were negatively 
itten Gee the particular shift and de- 
in which a supervisor was employed 


Mha ó 
* rently had no effect on empathetic 


Thi 
reSear Paper is based upon 


Ungitch 


d the writer’s gorn 
ivers “ected by Dr. H. H. Remmers, Purdue 
photos The dissertation, A Study of Certain 
Little eee Variables Related to Supervision i the 
“ibrary, tdustry, is on file in the Purdue University 


ability. Empathetic ability was no greater 
among supervisors who were considered by 
management to be the best than those con- 
sidered by management to be the worst. In- 
tercorrelations between related variables and 
the supervisors’ empathy scores showed that 
the supervisors’ own attitudes and knowledge 
were the chief factors influencing projection. 

The findings indicated important individ- 
ual differences in empathetic ability and the 
possibility of predicting from a regression 
equation the supervisor’s ability to empa- * 
thize with either labor or management. 


The Problem 


Today the American industrial system has 
become a house divided against itself. In 
industrial enterprise the supervisor is the 
direct connecting link between labor and 
management. The increasing difficulties, com- 
plications, and responsibilities of textile su- 
pervision have made it increasingly neces- 
sary to devote more effort to determining 
some of the psychological characteristics of 
good leadership and of the men now occupy- 
ing these positions. In the final analysis it 
is the supervisor who determines whether or 
not a given worker will keep his job or be 
fired or promoted. It is this supervisor who 
gives the orders and carries the directives of 
management to the workers. It is this same 
visor who has the only direct personal 
ith the workers, and to these work- 
ers his actions and decisions ate direct ex- 
pressions of company policy. Since efficient 
supervision demands a two-way channel of 
communication, it appeared likely that those 
individuals who have the ability to “put 


super 
contact W 


wer Research 
, SING COLLEGE 


286 


themselves in the other fellow’s shoes” and 
anticipate their responses would best be able 
to carry the directives of management to 
labor and the needs and attitudes of labor 
to management. This ability (empathy) and 
its relation to other psychological variables 
of supervision constitute the basis of this 
study. 
Background 


Though the concept of empathy is of compara- 
tively recent origin, the possibilities of its value 
in various situations has not escaped the atten- 
tion of investigators. Remmers (8), for ex- 
ample, used this concept when he was called 
upon to develop an experimental design to test 
the procedures used to reduce the social-psycho- 
logical gap between labor and management in a 
large industrial organization. Davidoff (1) was 
concerned with the reciprocality of empathy be- 
tween Negroes and whites while Miller and Rem- 
mers (7) studied the psychological distance be- 
tween organized labor and management. Inter- 
est in the attitudes of labor leaders toward 
industrial supervision was shown by Remmers 
and Remmers 


touched. Even 


empathetic ability than one who has low ability 
(4). A knowledge of the many variables affect- 
Ing empathy and the empathetic ability within 
an individual at different times would also add to 


ity is important for directing the 
work of others, 5 : 


Wendell M. Patton, Jr. 


Procedure 


Remmers (8) operationally defines empathy i 
“. . . having the subject or subjects predict 3 i 
ordinal or cardinal position of another individual 
or group on one or more scales of defined Pot 
chological dimensions.” The scale chosen st 
this study was How Supervise? (5). This Ha 
was administered to all front-line supe 
sors (secondhands), general foremen (averse ii 
members of top management and to a ran cs? 
sample of 10% of the employee group. It W 0 
also administered to the secondhands on ps 
other occasions: once with instructions to a 
Swer each question as they believed managem to 
would answer it, and again with instructions =e 
answer each question as they believed the Fo 
ployees would answer it. For the porpos the 
this study the scoring consisted of counting ver 
Correct responses as determined by the answer 
key. The index of empathy was compute; ted 
determining the difference between the predic "5 
scores for a given group and that same grouP 
actual mean score, snistered 

These same supervisors were also administe e 
The Adaptability Test (11) which was deslei 
to yield a general measure of intelligence- du- 
formation such as age, sex, experience and ords 
cation was obtained from the personnel p 
and a personal history blank which all Eupe on 
Sors completed. Since no suitable produc er- 
records were available for a criterion of SUP - 
visory efficiency, ratings of supervisors 
periors were used. Each supervisor was rat sa 
at least three superiors and from these rating se 
rank-order list was formed. Data from the 
Sources served to test relevant hypotheses. 


Results 


The extent to which textile supervisors, 
labor and management understand the ge 
chologically best methods of supervision S 
shown in Table 1. The front-line SUP ms 
visors scored higher than labor but manage 
ment scored higher than either the sup? 


Table 1 


. ‘ie ine 
Comparison of the Mean Scores of Labor, Front an 
Supervisors and Management on 
How Supervise? 


dard 
Bevan oto 
Group Number Mean tion u 
Labor 243 44.1 10.3 a 
Front-line 17 
Supervisors 54 48.1 8.5 i 
Management 18 53.8 47 a 


Studies in Industrial Empathy: III 


Table 2 


Comparison Between the Actual Social-Psychological 
Distance Between Management and Labor as 
Measured by How Supervise? and the 
Front-line Supervisors’ Pre- 
diction of this Distance 


Manage- Differ- 

ment Labor ence 

Actual Mean 53.84 44.14 9.70 

x Standard Error 1.14 66 1.31 

redicted Mean 46.87 48.72 —1.85 

Standard Error 1.16 1.04 1.55 
Di 

cca —6.97 +4.58 11.55 

andard Error 1.62 1,23 2.06 


texte or the workers. These differences were 
Table 2 at the 1% level of confidence. 

ting i e 2 shows a comparison between the 
i social-psychological distance between 
ts iia and labor and the supervisors’ 
ie ion of this difference. The super- 
fae sig to overestimate the workers’ 
Bort by ge of the best methods of supervi- 
tended ta mean difference of 4.58. They also 
edge of ee management's knowl- 
Mean ae best methods of supervision by a 
ool erence of 6.97. Both of these dif- 
confide were significant at the 1% level of 
is indi ce. The social-psychological distance 
is alg cated by the difference of 11.55 which 
en significant at the 1% level of con- 
it b e. From the data shown in Table 2 
comes obvious that the supervisors are 
agema o athizing optimally with either man- 
ing į nt or labor. Instead they are project- 
n both cases: positively toward labor and 


287 


labor. A Pearson 7 of + .44 was found be- 
tween the supervisors’ own scores and the 
scores they predicted for management. Both 
correlations were significant at the 1% level 
of confidence. This relationship very clearly 
implies that one important reason for the 
failure of supervisors to empathize is the 
projection of their own attitudes and knowl- 
edge to these other groups. 

Of the supervisory variables investigated, 
five appeared to be related to empathetic 
ability to such extent as to be considered of 
practical importance in the prediction of this 
ability. The intercorrelations of these five 
variables are shown in Table 3. The mul- 
tiple correlation between the variables shown 
and the supervisors’ predictions of manage- 
ment’s responses was indicated by R, -12345 
= + .53. The relationship between the same 
variables and the supervisors’ predictions of 
labor’s responses was shown by R, 12345 
= + .67. When these correlations are com- 
pared with the correlations for a single varia- 
ble of the supervisors’ own scores, it is evi- 
dent that this particular variable contributes 
the most toward the total relationship. 

A Pearson 7 of +.10 was found between 
the rank order of the supervisors as rated by 
their superiors and their predictions for labor. 
The correlation between this rank-order list 
and predictions for management was found to 
be +.09. Both of these correlations were 
too small to be of significance to this study. 
Consequently, it appears that the supervisors 
considered by management to be their best 
are able to empathize only about as well as 


those considered by management to be the 
f the small number of 


Nega ts 
8atively toward management. poorest. Because 0 r 
e Pearson + of + .61 was found between cases and the inherent weaknesses 1n the rank 
Su Supervisors’ own scores on the test How order, the results must be interpreted with 
ervise? and the scores they predicted for caution. 
Table 3 


the Supervisors’ Ability to Empathize 


Intercorrelations of Five Variables Found to be Related to 
ilii Supervisory 
i en Age Experience Education 

4 — -L 

z. Wow Supervise? Scores +48 =29 -2 +31 

> Adaptability Scores —44 a 68 

A +.72 —.45 
—.32 


Age 
4, s 
= Supervisory Experience 


288 Wendell M. Patton, Jr. 


Conclusions 4. Dymond, R. F. Personality and empathy. J. 


consult. Psychol., 1950, 14, 343-350. io 
In short, the findings indicate that the tex- 5, File, Q. W. The measurement of supervisory 


. Rey dick schol., 1945: 
tile supervisors are unable to empathize op- quality le J. appl. Psychol, 19% 
* . . D271, 

timally with either labor or management and Glib LM AAuge öradizion in labor rela 
that empathetic ability is related to certain tions. Studies in Industrial Relations No. 10- 


psychological variables of supervision in the Stanford: Shito Daiei Pee Il. 

arias ° ors 7. Miller, F. G. Studies in Industria’ Eee 
textile industry. The empathetic ability of The measurement of the gap between indus 
a supervisor as here operationally defined 


trial management and organised labor: a 
can be predicted from a regression equation. published Master’s thesis, Purdue Universit 


The principal reason for the failure of the Lafayette, Ind., 1949. ial 

pr P 8. Remmers, H. H. A quantitative index of socia” 
supervisors to understand management and psychological empathy. Amer. J. Orthop) 
labor is the projection of their own feelings, chiat., 1950, 20, 161-165. 


attitudes and knowledge upon these groups. 9. Remmers, L. J.. and Remmers, H. H. Stud et 
f Sia industrial empathy: I, Labor leaders’ 3 ir 
Received October 7, 1953, tudes toward industrial supervision and toe 
estimate of management’s attitudes. Perso 
References nel Psychol., 1949, 12, 427-436. jal re- 
i 10. Richards, A. W. A study of some industria q 
1. Davidoff, M. D. A study of empathy and cor- lations variables in one industrial company: 
relates of prejudice toward a minority group. Unpublished Master’s thesis, Purdue Univer 
Unpublished Ph.D. thesis, Purdue University, sity, 1950, Ji 
Lafayette, Indiana, 1948. 11. Tiffin, J., and Lawshe. C H. The Adaptability 
2. Dymond, R. F. A preliminary investigation of Test: A fifteen minute mental alertness tes 
the relation of insight and empathy, J. con- for use in personnel allocation. J. appl- PH 
sult. Psychol., 1948, 12, 228-233. chol., 1943, 27 152-163 i i 
$. Deh R F. A scale for the measurement of 12. Travers, R M. W. A study in judging the eA 
eee Rae J. consult, Psychol., 1949, ins of groups. Arch, Psychol., 1941, 260. 3 


NN E E 


Tue JOURNAL i 
Vol. 38, No. en aaa Psycnoiocy 


Organization Control in Business 


L. R. 


Personnel Department, Fairbanks, Morse 


It is becoming increasingly apparent that 
traditional organization charts, with their job 
titles and lines of authority, represent only 
one aspect of a business organization. They 
ate two-dimensional still lives of a living in- 
Stitution and can be likened to anatomy as 
Contrasted with physiology. Anatomy stud- 
les parts and organs of the body at rest, 
Fe physiology attempts to understand 

em in action. Organization charts fail to 
Show actual relations between the jobs and 
People of an institution. This is because the 
se Positions of a company are populated 

y human beings who are constantly acting 
and interacting to each other and to the 
changing conditions of business. Of course, 
Uch charts have their proper place in per- 
Sonnel control, Scientific personnel work 
Was founded upon analysis of the jobs and 
ome of an organization. It has become 
tion onplace to point out that job descrip- 

Ns are to the personnel executive what blue- 
si and material specifications are to the 
meee Too often, however, the personnel 
aane accepts his organization merely be- 
Scri € it has been formalized by charts and de- 
en Þtions, and proceeds to select and train 

'ployees to fill the positions thus created. 
ear emphasis upon job amano: 
Minis? an; engineering-minded personnel a ? 
opl tation. Such departments tend to trea 

Ple as a means rather than as an end in 

smselves, People are categorized as units 
Peon BY? not as people. The attempt > 
es € as a means rather than an end a a 
Man them from a sense of belonging mith 
Wo gement to the economy as We know it. 

tk likewise becomes a means—something 


. 
Sines a person’s real interests and goals; 
bile sting with which to obtain an automo- 


Partin, 1 evisioti set; something to be ae 
self, 8Y as a cost rather than a good in it- 
in the ao Of the reasons why this is so lies 
fluence fact that management, under the in- 
has a of an atomistic engineering science, 

Token down its job organization in such 


Gaiennie 


& 
E 


Co., Chicago, Illinois 

a way as to deprive employees of much of 
their creative relationship to work. The se- 
lection-minded approach to filling jobs so 
created has led to discouragingly meager re- 
sults over the last twenty years. 

It has become obvious that the personnel 
administrator’s functions must go beyond 
mere analysis and acceptance of his organiza- 
tion. After all is said and done, personnel 
efficiency is measured by the success of the 
company and the people in it. Today, per- 
sonnel administrators and psychologists are 
thus enlarging their field of interest—upward 
from the skilled hourly workers and outward 
toward the relationship between the positions 
in a given organization. The larger concept 
of “organization control” is enriching the 
older field of “employee selection.” This 
growth and interest is indeed heartening and 
represents the growing maturity of both per- 
sonnel and industrial psychologists. A means 
of relating these two aspects of industrial or- 
ganization to each other is now needed if ade- 
quate organization control is to be achieved. 
If these two structures can be measured in 
similar terms, then some progress may be 
made, since progress in any field is largely 
dependent upon quantifying the data under 
consideration. 

To perform his function effectively, the 
personnel executive must take a critical look 
at both the “job structure” and the “people 
structure” of his company. Both functions 
and people within the company must com- 
bine to produce harmony and profits. Thus, 
business organization has at least two struc- 
tures: (1) the make-up and relationship be- 
tween the various positions of the company; 
and (2) the make-up and relationship be- 
een the various persons occupying these 
As has been pointed out, these two 
are merely separate aspects of the 

Complete understanding of 
dent upon understanding the 
f personnel control 
sonnel struc- 


tw 
positions. 
structures 
same problem. 
each is depen 
other, and the purpose 0 
is to achieve proper job and per: 


289 


290 


tures and assure a balance between them. 
Organization control is an attempt to meas- 
ure degrees of conformity between the job or- 
ganizational requirements, and the abilities 
and performance of job incumbents. Organi- 
zational efficiency is, in this respect, equivalent 
to the cost accountant’s measure of standard 
versus actual performance. In this sense, or- 
ganization control is designed to lead toward 
remedial or preventive action and as such lifts 
the older concept of employment personnel 
to a new and more dynamic level. 

For some time, various rating techniques 
have been used for measuring the relative 
complexity of jobs as a means of relating all 
jobs to one another. These techniques, when 
applied to management positions, are usually 
termed “position evaluation” and form the 
basis of most modern salary administration. 
Considerable work has been done to estab- 
lish the reliability of such data and, in gen- 
eral, it has been accepted by employees and 
management as the most rational and objec- 
tive basis so far developed for measuring the 
relative value of jobs. In order to achieve 
our objective of relating the two aspects of 
organization one to the other, evaluation ele- 
ments which can be applied with equal ease 
to both jobs and people must be used. Ex- 
amples of such elements are: planning ability, 
skill with people, job knowledge, quality of 
work, responsibility, and experience. All the 
elements (factors) are defined in a manner 
equally applicable to both the job and the 
job incumbent, as are the various grades 
within each element. 

By applying the usual rules of job evalua- 
tion to all of the management positions, data 
are obtained. After this has been done, each 
job incumbent is rated for the same elements 
against the ratings for the job he occupies. 
Evaluation of job incumbents is performed 
in a completely separate series of rating ses- 
sions. The same or different people may be 
used. The same principles are applied in 
evaluating people as are used in evaluating 
the jobs. 

The evaluating sessions for job incumbents 
differ from the position evaluation series only 
in that: (a) different evaluators may be used 
for the two sessions; (b) one series evaluates 
people and the other evaluates positions; and 


L. R. Gaiennie 


(c) the incumbent is rated against the re- 
quirements for the position he occupies. 

In the studies so far made, the data from 
the two evaluations have been entered on mas- 
ter cards. These measures are then treated 
statistically to arrive at total scores for each 
job and each job incumbent. Comparisons 
between positions and people can then be 
made and it can readily be seen if an a 
ployee exceeds, equals, or is beneath the jo 
requirements. These techniques are subject 
to all of the limitations and errors of any 
rating procedure. 

When similar cards have been filled out for 
all employees and positions, an almost in- 
finite number of comparisons can be made. 
Some of the most obvious are: 


1. Comparisons between jobs. This can = 
used for such purposes as to obtain a better 
organization, increase or decrease the e 
content of certain positions, or to establish 
salary structure. Jl 

2. Comparisons between people, Since 4 
the people have been evaluated on the same 
basis, direct comparisons can be made. _, 

3. Comparisons of jobs and people. ae 
information can be used for training, organ- 
zation control, upgrading, and standardiza 
tion of psychometric tests. P 

This approach highlights the fact that re 
duced variances between the requirements a 
the job and the abilities of job incumbent 
can be achieved in several ways; namely: 


1. By modifying job content. wheres 
discrepancy exists between the job and t 
incumbent it is possible to add or subtrac 
duties and responsibilities. þe- 
2. By changing personnel. Balance ut 
tween job and personnel can be brought abo 
through training or transfer of personne’ ar 
achieve maximum use of company manp aii f 

3. By changing both job content and p° 
sonnel. 


If the evaluation data for positions and pr 
ple thus obtained are charted, the result? e 
series of curves demonstrate in quantitati 

terms the organization as a whole. That E 
say, it now becomes possible to study P ier 
the groups of jobs and the groups of per 
since all have been reduced to common de- 
nominators. Curves of this kind ate 


Organization Control in Business 


veloped by arranging the positions of the or- 
ganization in their order of complexity. It is 
then possible to plot both personnel and posi- 
tion ratings, keeping the data arranged in the 
order of the total position evaluations. 

oo charts for each evaluation element 
disc so possible. For purposes of the present 

ssion, the following terms are defined: 


yi Position Gradient—that curve obtained 
NS the positions in ascending order of 
the oe position evaluation scores along 
iy scissa and plotting their total or ele- 
3 Position scores on the ordinate. 
by arenon Gradient—that curve obtained 
aA ry! the job incumbents in ascending 
along ah their total position evaluation scores 
Slee e abscissa and plotting their total or 
nent personnel evaluation scores on the 
Ordinate. 
t a Positive Variance—any portion along a 
abilities element evaluation curve where the 
eed | of the job incumbent are judged to 
4 ein job requirements. ; 
total egative Variance—any portion along a 
AG or element evaluation curve where the 
Job requirements exceed the abilities of the 
™Mcumbent. 


Hypotheses for Experimental Test 


3 sides practical value to personnel execu- 
imit; this method of analysis, in spite of its 
ree has certain advantages to those 
its ested in organizational theory because of 
ee titative nature. An almost infinite 
ni Ki of relationships between the personnel 
the ie data can be isolated and made 
Den of more detailed study- Because 
e f a been so little research in this field, 
Pot = lowing suggestive questions and hy- 
furt vid are stated as a means of stimulating 
ional, work on these and related organiza- 
Problems: 


a selective devices, 
>: be developed using job r 
ìon Criterion? The typical test standard- 
up ina in industry has been to select a 
Sepa, employees working on related jobs; 
arate the good from the poor perform- 
he to standardize the tests on this 
Stent in Two major weaknesses are in- 
in this approach: (1) the groups used 


such as standard- 
equirements 


erg 
Cri 


h 


294. 


in such a process tend to be small, thus re- 
ducing the reliability of the data; and (2) 
ability measures do not necessarily measure 
employee performance due to such factors as 
motivation. By using job evaluation ele- 
ments such as “planning ability” as the cri- 
terion, whole populations can be tested and 
used in standardizing tests. This approach 
allows for cutting scores by job type and 
separates out the motivational aspect from 
the testing devices for separate measures. 

2. Is it better to place individuals in posi- 
tions which just equal, exceed, or are less 
than their abilities? This question strikes 
more directly at the problem of motivation 
and related problems. Using data obtained 
through this technique, it is possible to segre- 
gate separate populations from one or more 
organizations as follows: (a) greatly exceed 
job requirements; (b) exceed job require- 
ments; (c) equal job requirements; (d) be- 
neath job requirements; and (e) greatly be- 
neath job requirements. Having isolated the 
groups for study, various experimental de- 
signs can be used to ascertain their relative 
efficiency. If desired, it is also possible to 
segregate still other subpopulations within 
each one of the above groups. For example, 
those who exceed job requirements could be 
subdivided into those who equal, are beneath, 
or are above the job requirements on a par- 
ticular element. 

3. What are the effects of personnel re- 
versals upon organization performance and 
morale? There is evidence that some of the 
most uncooperative union stewards are an- 
tagonistic because they are more capable than 
the foreman over them. Applied to manage- 
ment organization, what are the practical re- 
sults of such a situation? Is it the same at 
all levels of an organization? The question 
of reversals and their effect upon organization 
efficiency should probably be studied at three 
points on the curves: (1) those who repre- 
sent reversals; (2) their superiors; and (3) 
their subordinates. 

4, Is it possible to have an efficient organi- 
zation without positive position and personnel 
dients? In recent years, there has been 
cussion regarding democracy 
d by such terms as “bot- 
Is this feasible as 


gra 
considerable dis 
in business, expresse 
tom-up management.” 


292 


applied to the two organizational structures 
herein discussed? The proponents of bottom- 
up management, to the extent that they would 
modify personnel or position gradients, have 
an opportunity to study various type gradi- 
ents and to report the relative efficiencies of 
each. o 

5. Given a particular set of conditions, 
such as size of organization, type of activity, 
etc., are there particular gradients which re- 
turn optimal results? If positive answers can 
be given to this question, businesses might 
be spared thousands of dollars in cost as 
they establish or reorganize their operations. 
There is some evidence that certain general- 
izations may be discoverable. For example, 
the writer has heard competent business ex- 
ecutives express the following ideas, which are 
open to experimental test under the method 
outlined in this paper: 


a. Job-shop organizations require more 
complex position and personnel gradi- 
ents than production organizations. 

b. Large organizations tend to develop or- 
ganization gradients significantly differ- 
ent from small organizations. 

c. “Mental” organizations or departments 
such as engineering, research, and de- 


velopment demonstrate flat gradients as 


compared to the typical manufacturing 
line organization. 


6. Are there optimal gradients which should 
be established as objectives if it is anticipated 
that a given organization is going to expand 
or contract? The organizational strains due 
to change are especially apparent during quick 
expansion or after continued long-term 
growth of a company. When this happens, 
previous methods and personnel must adjust 
to the new situation. The following hy- 
potheses are suggested as being subject to 
experimental verification: 


a. In a given business organization, if the 

job gradient expands and moves q 
upward on the ordinate, severe 
zational strains will occur unless t 
sonnel gradient is caused to do li 
Organizational strains will ensue 
job gradient changes its relative 
while the personnel gradient remai 


uickly 
organi- 
he per- 
kewise. 
if the 
shape 
ns the 


L. R. Gaiennie 


same. Empirically speaking, it is pe 
able that production control insalati 
have a high mortality rate because t a 
are frequently installed by outsiders Ha 
convince top management their Syste ‘ 
can be installed without disturbing T 
isting personnel. When this happar 
the new system frequently creates at 
organizational strain (modifying the oe 
but not the personnel gradient; t ioe 
establishing negative variance) OF “a 
existing personnel through mass me 
finally defeat the new system. 


7. Given a particular organizata. = 
there optimal organizational curves ume ro- 
late to particular company policies and ae 
cedures? Observations to date indicate hly 
such may be the case. For example, nie T: 
centralized multi-plant companies display y 
ferent organizational gradients than a 
decentralized multiple plant operations. d or 
tain corporations which have centralize cen 
decentralized their organizations in re jza- 
years are known to have created SIEP ee 
tional strains which might have been redu di- 
if they had considered their existing ae 
ents in relation to proposed objectives be 
starting their programs. more 

8. Can training programs be made who 
realistic and be given to those employ A iam 
need assistance in the particular pre hat 
areas uncovered? Work so far indicates vat 
much of the training time spent in indu are 
is of a blunderbuss variety. Companies a 
too often prone to dangle a watch in oe 
bored employees in the hope that such nce: 
sions will somehow improve perform2 ing 
This technique allows for selective ent 
or transfer of employees based upon ae 
taken. In addition, it allows for pisht ete 
ing measures to ascertain the relative © 
tiveness of such programs. 5 are 

The above questions and hypotheses rob- 
meant to be suggestive of the kinds of rally 
lems which can be attacked experimen nel 
through use of the position and pers ese 
evaluation method. Further work on 
and related problems is badly need gss 
establish factual guides for the busi 
executive. 


Received October 7, 1953. 


p am 


t 


Z 


THE JOURNAL 
AL OF APPLIE svi GY 
Var eae OF are Psycuo.ocy 


Quantitative Analysis 


of Verbal Evaluations * 


Sidney H. Newman 


U. S. Public Health Service, Washington 25,.B. GC. 


Ewe gogon | used, verbal evaluations in- 
ne A peronmance reports furnish “im- 
Dani iet and qualitative observations.” 
— ative analysis of such comments can 
tablish the usefulness of these reports and es- 
is. om reliable basis for comparing individu- 
ane heed paper describes the development 
reine ility of a procedure for scoring 
Used r s obtained from an efficiency report 
ONS evaluate the job performance of 
Public T officers in the United States 
UE es ealth Service. This work is part of 
4), earch program discussed by Newman 


Method 


boat quantitative method developed here 
the re i of verbal evaluations involved 
nique aptation of three well-known tech- 
S. 
wee method of content analysis sug- 
ments oe classification of supervisors’ com- 
een ern i categories. Content analysis has 
fe eee oyed extensively by Lasswell and 
a ciates (3) in analyzing the political 
ee content of mass media. 
stone pe the technique evolved by Thur- 
Was of ) for scaling attitudinal statements 
ed gs to assign values to comments classi- 
Bators each of the categories. Other investi- 
non as Uhrbrock (8) have applied the 
ents one technique to the scaling of state- 
sonal Concerning job performance and per- 
Characteristics. 
ana, methods like those introduced by 
andw ike (6) for measuring the quality of 
ee E were the basis for the use of a 
er scale for scoring each comment. 
te fe procedure used in establishing @ sys- 
ae scoring the comments in the efficiency 
i consisted of the following steps: 
from A total of 779 comments were collected 
the “remarks,” “handicaps,” and “rec- 


* 

i Ty y à 
tion the Writer wishes to acknowledge with apprecia- 
Td who ae of Mrs, Jane S. Harris who was Scorer 

also did much of the statistical computation. 


re 


ommendations” sections of several hundred 
officer efficiency reports. A comment was 
defined as any word, phrase, or clause con- 
stituting a unitary evaluative description of 
the officer upon whom the report was pre- 
pared. 

2. By sorting and grouping all comments, 
it was possible to establish 12 descriptive 
categories relevant to officer characteristics 
deemed important to the Service and suf- 
ficiently independent of each other to allow 


Table 1 


lity Coefficients for Scale Placements 
in Each Category 


Reliabi 


No. 

Category Items  ru* runt 
General evaluation 69 .90 99 
Potentiality for future 32 86 99 
Training and experience 30 87 99 
Relations with work 

associates 35 85 .98 
Relations with official 

groups 16 93 99 
Relations with patients 

and public 26 95 99 
Motivation 39 88 99 
Job proficiency 70 89 99 
Job progress 23 89 99 
Potentiality as candidate 

for Regular Corps 23 89 .99 
Work attitudes 60 86 .99 
Intellectual qualities 88 95, .99 
Intellectual qualities 

(“duplicate” items 

46 93 99 


removed) Í 


Average of correlation of each judge’s place- 


"n= 
ments with every other judge’s placements. 
tron = Reliability of average of 11 judges. 
the high correlations 


hypothesis that 
“duplicate” items, these were re- 


items were defined as those 
having: (a) the same adverb modifiers; and (b) scale 
placements differing by no more than one place. It 
may be seen that removing these items has very little 


effect on the correlation. 


293 


To test the 
were produced by 
moved. “Duplicate” 


294 


classification of comments. 
are listed in Table 1. 

3. The comments in each of the cate- 
gories were placed on a nine-point scale from 
“undesirable” (1-3), to “neutral” (4-6), to 
“desirable” (7-9) by 11 Public Health Serv- 
ice judges, most of whom were psychologists. 
Each comment was assigned a numerical 
value for use in scoring; this value was the 
median of the scale placements made by the 
different judges. 

4. A scoring manual was constructed by 
listing, in each of the 12 categories, the com- 
ments with their median values. In using 
the manual, the scorer identifies a comment 
and classifies it in one of the 12 categories. 
This process of identification and classifica- 
tion is defined here as coding. He then 
matches each comment as closely as pos- 
sible with one in its category in the manual 
and assigns it the listed numerical value. 
The numerical values of all comments in an 
efficiency report are averaged to obtain the 
raw score for the verbal evaluation parts of 
the report. In this article, the entire process 
of arriving at scores, involving both the cod- 
ing of comments and assigning of numerical 
values to them, is termed scoring. 

5. Reliabilities of the scale placements and 
the scoring methods were determined. 


These categories 


Results 


Reliability of Scale Placements. As ob- 
tained by the method of average intercorre- 


Sidney H. 


Newman 


lation, Peters and Van Voorhis (5), the re- 
liabilities of the scale placements made in 
each of the 12 categories by the 11 judges 
are shown in Table 1. 

For comparative purposes, split-half co- 
efficients based on correlations of the averages 
of five judges with those of five other judges 
and stepped up for 10 judges by. the Spear- 
man-Brown formula were computed for four 
categories. The results for 10 judges Pes 
similar to those obtained by the method 0 
average intercorrelation for 11 judges (see 
Table 1): training and experience, 7s s = 99) 
1o10 = -97; relations with work associato, 
Ts s = -98, 71 10 = .99; relations with patien ‘i 
and public, 7, , = .99, foio = 99; proficiency: 
faa = 98, tio 19 = .99. 

Reliability of the Scoring Method. ae 
reliabilities of the comment scores assign© 
under various conditions are presented u 
Table 2. = 

In one method of determining reliability» 
two people, one trained by the other, 1 z 
pendently scored the comments in a samp“ 
of officer efficiency reports, ‘The scores 4 
signed by the different scorers were correlate 
(Scorer 1 vs. Scorer 2). é 

In the other method, scores assigned by vm 
same scorer on two different occasions Wet 
correlated (Scorer 1 ys, Scorer 1, and Scorer 
2 vs. Scorer 2). Precautions were taken 5 
minimize the effects of memory and, an 
contaminating factors, During the period al 
tervening between the two scorings, the score 


Table 2 


Reliability of Comment Scoring 


Total relation 
m No. of wien cor 
Efficien ey Bicol Comments Coded the o A Sam 
Reports  Codedm>  Semein Both Scorings Coded the ipg 
Scored Both Scorings No. Per Cent i nt 
Scorer 1* vs. Scorer 2 32 114 99 86 
Scorer 1 vs. Scorer 1 87 i 
(4 mo. interval) 30 99 95 
Scorer 2 vs. Scorer 2 "8 ae ‘ 
(4 mo. interval) 39 141 5 95 
Scorer 2 vs. Scorer 2 S sa Í 
(14 mo. interval) 40 151 113 74 96 


* Scorer 1 was more experienced in scoring than Scorer 2. 


Quantitative Analysis of Verbal Evaluations 295 


worked on other efficiency reports and was 
not allowed to see those utilized in the stud- 
les. The extent of agreement in coding com- 
ments in both of the scorings is also shown in 
Table 2 for all four studies. 

In the comparison of the scoring done by 
Scorer 2 on two occasions, separated by a 14 
Month interval, the comment scores assigned 
Were also averaged to obtain a single score 
for the verbal evaluation parts in each ef- 
ficiency report. The average scores assigned 
€ach of the 40 reports on the two different 
occasions correlated .94. 


Discussion 


A procedure for quantitatively analyzing 
verbal material for the comparison of officer 
Performance has been developed. The place- 
ments of verbal comments on a nine-point 
Scale, basic to the development of the scoring 
Procedure, can be reliably achieved by a 
relatively small number of judges. In agree- 
Ment with these findings, Uhrbrock (8) ob- 
tained high reliabilities for the Thurstone 
Scale values of descriptive rating scale state- 
ments; Hinckley (2) and Ferguson (1) found 

at scale values assigned attitudinal state- 
pat by use of the Thurstone technique were 

ighly reliable. 

Three aspects of the reliability of si 
A the use of the scoring manual were con- 
a ered: the correlation between scale values 
Ssigned the comments, the correlation be- 
Piles average comment scores, and the per- 

Be of agreement in the coding of comments. 
he reliability of comment scores was 
‘eg to be higher for scores assigned by oe 
in € scorer on two different occasions (9 

One study and .96 in the other) than for 

© scores assigned by two different scorers 

Increasing the lower coefficient might 
bs accomplished by adding the occasional - 
a Mments to the large sample in the Sare 
RR increasing the similarity of jmen 
sco ards through cooperative training an 

Oring, 


i The reliability of average com 


coring 


fou; 


ment scores 


Analogous to the reliability of total scores 
a test. In one of the studies (Scorer 2 vs. 
terval), 


c ., 
°ter 2, separated by a 14 month in 


it was found that the reliability of average 
comment scores (.94) was similar to the re- 
liability of individual comment scores (.96). 
This suggests that when the reliability of com- 
ment scores is found to be high, average com- 
ment scores will also be reliable. 

Agreement in the coding of comments may 
be considered fairly high, since in three out of 
the four studies the percentages of agreement 
were, respectively, 82, 87, and 94 per cent. 
In the remaining study, the percentage of 
agreement was 74 per cent, but this coding 
was done by the more inexperienced scorer, 
with a 14 month interval between the scor- 
ings. This interval probably represented a 
training period in which the scorer may have 
developed more ability to code comments. 
Agreement in the coding of comments might 
be increased, in general, by giving scorers ex- 
tensive training in this aspect of the method. 

The findings on reliability of the procedures 
used here for scoring comments in efficiency 
reports suggest that this method may be use- 
ful for analyzing quantitatively other kinds 
of verbal material. In occupational situa- 
tions, supervisors usually like to make verbal 
reports; this scoring procedure will allow 
quantitative utilization of material which has 
ordinarily been merely “taken into considera- 
tion.” It is also likely that these quantita- 
tive procedures will prove useful in such fields 
as propaganda analysis, the analysis of litera- 
ture, or the analysis of the verbal reports of 
patients or interviewees. Of course, it would 
be necessary to determine the reliability, 
validity, and other relevant characteristics of 
the scores obtained in any given situation. 


Summary and Conclusions 


A method for the quantitative analysis of 
verbal material is presented. Verbal material 
is quantified by categorizing each unit of ma- 
terial (each comment) and comparing it with 
the empirically derived master scoring scale 


constructed for that category. The findings 


show that: (a) the comments in each cate- 


gory can be reliably placed on a nine-point 
scale by a relatively small number of judges; 
and (b) scores, based on either the individual 


comments in each category or the average 


296 


comment scores for each report, are reliable. 
It is suggested that the procedures developed 


here can be utilized for other types of verbal 
material. 


Received October 26, 1953. 


References 


1. Ferguson, L. W. The influence of individual atti- 
tudes on construction of an attitude scale. wi 
soc. Psychol., 1935, 6, 115-117, 

2. Hinckley, E. D. The influence of individual opin- 
jon on construction of an attitude scale. J. 
soc. Psychol., 1932, 3, 283-296, 


Sidney H. Newman 


3. Lasswell, H. D., Leites, N., and associates. Lan- 
guage of politics; studies in quantitative se- 
mantics. New York: G. W. Stewart, 1949. 

4. Newman, S. H. The officer selection and evalua- 

tion program of the U. S. Public Health Serv- 

ice. Am. J. Publ. Hlth., 1951, 41, 1395-1402. 

5. Peters, O. D. and Van Voorhis, W. R. Statisti- 
cal procedures and their mathematical bases- 
New York: McGraw-Hill, 1940. 

. Thorndike, E. L. Handwriting. Teachers Coll. 
Rec., 1910, 11, 83-175. 

- Thurstone, L. L. Attitudes can be measured. 
Amer. J. Sociol., 1928, 33, 529-554. i 

- Uhrbrock, R. S. Standardization of 724 rating 


scale statements. Personnel Psychol., 1950, 3, 
285-316. 


~ 


THE JOURNAL or A n 
Vol. 38, No. 5, 1 Od PsycHoLocy 


Standardization of the GATB for the Occupation of Tabulating 


Machine 


Operator * 


Minnesota State Employment Service in Cooperation with the U. S. 


Employment Service, U 


. S. Department of Labor, 


Washington, D. C. 


ine study is concerned with the predic- 
Ae gs or failure in the occupation of 
dicte A Machine Operator. It was con- 
Bevis Dy the Minnesota State Employment 
Emplo ir cooperation with the United States 
a oth ea (USES) and the Na- 
(NMAA). ine Accountants Association 
one study is an attempt to de- 
ee i ional norms for this occupation. It 
in Flori eroniy of previous studies conducted 
eing da, Ohio, and Minnesota, the latter 
Northy conducted in cooperation with the 
versi west Chapter of NMAA and the Uni- 
ity of Minnesota. 


Procedure 


„ampla. The sample in this study is com- 
viz., Caio: operators employed in four states, 
Wiktonsin wet North Carolina, New Jersey, and 
and 107 Of the 203 operators, 96 are women 
it are men. 
tional ee of operations listed by the Interna- 
Y Ren usiness Machines Company (IBM) and 
tors in eo Rand were performed by opera- 
structed e sample. Participating firms were in- 
or teina refer all tabulating machine operators 
S If this procedure was not feasible, 


The 
Posed 


1 
Mingttth E. Potter, State Test Technician of the 
Sbonsibis State Employment Service, had major re- 
an ity for the supervision of the total study 
a ie eparation of this article. Participating ìn 
oi opment of the experimental design for the 
a ere Ruth E. Potter and John R. Boulger of 
nnesota State Employment Service; Dr. Bea- 
Dvorak and Albert Mapou of the United 
mployment Service; and Mr. Wayne Spiel- 
tion, the National Machine Accountants Associa- 
James R the Minnesota State Employment Service, 
cal an yan and Robert Coll conducted the statisti- 
the alysis of the data. At the national office 0 
ton, ited States Employment Service in Washing- 
C., the following persons participated in the 


the 
stua? 


trig 
States 
an of 


Plann” 

tion IRE of the study, coordination of the collec- 

Wi data by the New Jersey, North Careline 
erv- 


. C 
ices, at and California State Employment 
aT review of the completed study: Dr. Bea- 
Ylvia pp Vorak, Albert Mapou, Charles Meigh and 
Toke, For the National Machine Account- 
ot ciation, Mr. Wayne Spielman directed the 
ional activities among NMAA membership. 


operators were to be selected for testing who 
representative of operators employed by the 

n with respect to age, sex, work-level, and ex- 
perience. 

All operators had been employed for six 
months or longer so that they had completed the 
probationary period for this occupation, 

The Criterion. The criterion was a rating 
scale? which included items considered by se- 
lected Tabulating Machine Supervisors to be im- 
portant for successful work performance as a 
Tabulating Machine Operator. 

Supervisors were instructed to rate operators 
in comparison with Tabulating Machine Opera- 
tors “in-general.” This instruction was used to 
obtain, as nearly as possible, comparability of 
ratings among the participating firms. A re-rat- 
ing was conducted within a two-week period for 
the purpose of determining reliability. A reli- 
ability coefficient of .878 with a standard error 
of .004 was obtained. Since re-ratings were not 
available for the entire sample. the first rating 
was used as the criterion. 

The rating scale was composed of $ items for 
which the rater had five choices of response indi- 
cating the degree of performance of the operator. 
Weights of 1 through 5 were assigned to these 
responses so that the minimum possible score 
was 8 and the maximum was 40. The mean 
score was 26.05 with a standard deviation of 6.7 
and the range was 8 through 40 for the sample 
of 203 operators. 

All operators having scores one standard devia- 
tion below the mean, or lower, were placed in the 
Low criterion group. Therefore, 37 operators 
comprise the Low group and 165 operators were 
contained in the High criterion group. 

The Predictive Instrument. The machine-scor- 
able form of the General Aptitude Test Battery 
(GATB) was used for this experimental study. 
This battery, composed of 12 tests, measures 9 
aptitudes, viz., general intelligence (G), verbal 
ability (V), numerical ability (N), spatial apti- 
tude (S), form perception (P), clerical aptitude 
(Q), motor coordination, (X). finger dexterity 
(F), and manual dexterity (M). The general 


2 The Staff also wishes to acknowledge the impetus 
given the study by Kenneth Schenkel whose Ph.D. 
research was the 1952 study. Through Dr. Schenkel 


came the contacts with the NMAA and, apart from 


the standard USES materials and approach, the rat- 
ing scale and accessory materials (modified) devel- 


oped by him were used for this study. 


297 


298 


vorking population norms are established on the 
basis of k Valecea sample of 4000, stratified to 
obtain proportional occupational representation 
as shown by the 1940 Census of the Population. 
The general-population means for aptitudes in the 
battery are 100, with standard deviations of 20. 

Statistical Analysis. The significance of GATB 
aptitudes for the occupation of Tabulating Ma- 
chine Operators was determined on the basis of 
mean aptitude scores, standard deviations, va- 
lidity coefficients, and job analysis data, as shown 
in Table 1. 


Results 


Aptitudes significantly related to success in 
the occupation as evidenced by high mean 
scores, low standard deviations, significant 
validity coefficients and identification through 
job analysis are: (G) general intelligence; 
(N) numerical ability; and (Q) clerical apti- 
tude. Spatial aptitude (S) is also related to 
the occupation as indicated by the validity 
coefficient, identification through job analysis, 
and because it adds to the selective efficiency 
of Aptitudes G, N, and Q. 

Minimum scores for Aptitudes G, N, Q, 
and S were set approximately one sample 
standard deviation below the sample mean 
rounded to the nearest five-point score level, 
This results in norms consisting of G-95, 
N-95, S-85, and Q-100. 

To evaluate the selective efficiency of these 
norms in terms of the relationship between 


Table 1 


Means (M), Standard Deviations (S.D.), Pearson 
Product-Moment Correlations with the 
Criterion (r) for the Aptitudes 
of the GATB 
Note: N = 203 Tabulating Machine Operators. 


Aptitude M S.D. P 
G 111.4 14.4 34 
V 109.1 15.1 .22** 
N 111.6 14.8 36** 
S 106.5 18.3 .20** 
P 109.9 13.9 10 
Q 116.4 port 5 
K 112.0 16.4 -08 
F 105.6 19.9 -10 
M 106.7 20.9 -10 


* Significant at the 5% level, 
** Significant at the 1% level, 


Minnesota State Employment Service 


Table 2 


Relationship Between Pass-Fail on Test Norms Le 
sisting of Aptitudes G, N, S, and Q with Critica 
Scores of 95, 95, 85, and 100, respectively, 
and the Criterion 


Group Fail Pass Total 
High 40-126 ee 
Low 20 17 
Total 60 18 208 
Tte = 48; Gret = 14 
x? = 11.643, p/2 = .001 


those operators passing and failing the To 
and those in the High and Low om 
groups, tetrachoric correlation and Chee ip 
techniques were employed. The apne 
between test norms and the criterion is sho 
in Table 2. tra- 
Both the Chi-square test and the te ig- 
choric correlation indicate a statistically ak 
nificant relationship between passing the e 
norms and success on the job, as eres 
by the criterion. Fifty-four per cent 0 7 
Low criterion group fail the norms, while 
Per cent of the High group pass the norms: as 
Cross-Validation, Previously derived aon 
based on the original Minnesota sample, a in 
samples of independent studies conducte ae 
Ohio and Florida were applied to the ee 
tional sample, Although these norms pier 
related to job success, they were not as P Je 
dictive of job success for the national re 
as they were for the samples from which ai 
were derived. In general, the same cna of 
appeared to have predictive value for eac j 
the studies, but some variation was fou 
with respect to the critical scores obtained. 


Summary 


This study Teports the development of 1 
tional norms, based on the GATB, for the ° 
cupation of Tabulating Machine Operator: e 

General intelligence, numerical apin 
Spatial aptitude, and clerical ability W°, 


i 
found to be significantly related to success 
the occupation. 


Received September 10, 1953, 


yas 


=e 


THE JOURNAL ipi 
Vol. 38, No. Ay au PsycHoLocy 


Comparative Validities in Clerical Testing 


Edward N. Hay 
Edward N. Hay & Associates, Inc., Philadelphia, Pa. 


oe Fe ge with another project the 
deie — Perception Test and the Won- 
19 ca Hp sg Test were administered to 
ee idates for a special task in a life 
ilere nee company. It was observed that 
Përso was an extraordinary number of high 
a a Test scores. Out of 19 people 
above a scores of 40 or more, which is 
Bem e 98th percentile in similar groups. 
95th ean for this group was 35.6; about the 
had ‘ee This insurance company 
is ets using the LOMA No. 2A test, which 
ee Ts for the selection of clerical work- 
Paxson correlation of .66 was found between 
of aa Test and LOMA No. 2A, but an 7 
Numba =o between Personnel Test and 
2 ce Perception Test. Thus LOMA No. 
fair] seemed to be excluding any except 
=d bright applicants. 
rm it is known that the LOMA No. 
ple a On a good predictor of success in sim- 
onin clerical work, as well as of pro- 
as Po ity. The Number Perception Test 
cletical established its efficiency in routine 
very | selection and consistently correlates 
br low with mental ability tests. The ques- 
of i mediately presented itself as to which 
NiET: two tests, LOMA No. 2A or Hay 
validi er Perception, would show the higher 
ity in this company in predicting speed 
se duction for low-level clerks. This first 
i wt afforded no criterion of success, since 
em nd a mixed group with the majority of the 
hae in supervisory or technical posi- 
touti; So, another group, engaged in simple 
e ne clerical work, was selected in order to 
TER the validity of these two tests. The 
admi Clerical test was also available so it was 
cler nistered, too. The subjects were the 24 
West in one department, all but two ™ the 
Simple pay. classifications and performing 
Wome: routine tasks. Of these, 23 were 
eso none had had any supervisory 
37 usibility. Average length of service was 
Months, with six over 5 years, nine be- 


tween 1 and 5 years, eight under 1 year and 
one at five months. Correlation between 
length of service and the supervisor's ratings 
described below yielded a coefficient of — .08. 

The Tests. LOMA No. 2A is a test avail- 
able only to life insurance companies. It is 
an omnibus work-limit test in six parts: 
checking, directions, same-opposites, prov- 
erbs, arithmetic and spelling. Score is a 
combination of time and errors. Adminis- 
tration time averages about 35 minutes. 

Wonderlic Personnel test is a well-known 
mental ability test composed of a variety of 
verbal and numerical problems. 

SRA Clerical is in three parts, speeded and 
timed separately. Vocabulary is a 5-minute 
test of 48 items. Arithmetic allows 15 min- 
utes for 24 problems of numerical reasoning. 
Checking is a 5-minute coding test of 144 
items. 

Hay Clerical Battery is composed of three 
speeded tests of 4 minutes each. Number 
Perception has 200 pairs of three- to six-digit 
numbers, the task being to check those that 
are the same. Name Finding requires the 
subject to look at a name and remember it 
well enough to pick it out of a group of four 
similar names on the back of the sheet. 
Number Series consists of 30 simple number 
series completion problems. 


The Criterion. The criterion was the aver- 


age of the ratings made by the department 
head and assistant department head. They 
were made about three weeks apart and 
wholly independently. The rating method 
employed three rating principles in combina- 
tion; graphic scale, man-to-man comparison 
and forced distribution. All 24 names were 
listed on a single rating sheet described as 
“Speed of Working” and the rater was asked 
to place a check mark on the line opposite 
each employee’s name in such a way that ap- 
proximately one-half of the names were 
checked in a vertical band designated as 
“Average,” 2bout one-fourth “Above De- 


299 


300 Edward 
partment Average” and about one-fourth 
“Below Department Average.” Distinctions 
among employees in each of these three 
groups were to be indicated by the relative 
positions of the check marks along the lines. 
After the ratings were completed the value 
of each mark was measured on a scale rang- 
ing from O to 40, this particular scale being 
arbitrary. 

The product-moment correlation between 
the two sets of ratings yields an r of .89, in- 
dicating a highly reliable criterion. 


Results 


The second column in Table 1 shows the 
correlations between scores on the various 
tests and the average of the scaled values of 
the two ratings. These coefficients point to 
the greater efficiency of four of the tests, but 
such coefficients cannot always be relied upon, 
especially in so small a sample, because re- 
gression is not always rectilinear. 

The third column of Table 1 shows the 
best cutting score on each single test or com- 
bination of tests. The 24 cases fell into five 
groups as rated by the two supervisors. 


N. Hay 


Group I was rated “Good” by both ss i 
group II was rated “Good” by one rater 
“Average” by the other, etc. The first ee 
groups were considered “Good.” Groups 
and V were considered “Poor” since one oF 
the other rater had so classified all mae 
bers. Cutting scores were selected by Me 
spection which would admit the greatest num 
ber of subjects rated “Good” to that gron 
and exclude the greatest possible numbe 
rated “Poor.” , id 

The only combination of tests which heres! 
increase predictive efficiency was Num a 
Perception and Name Finding, which oes 
rectly assigned 21 out of 24 subjects to y 3 
Proper group, “Good” or “Poor.” This ae 
significantly better than chance at the ° 
per cent level. 


Discussion 


This study confirms other similar iT 
with some of the same tests in showing ps 
Prediction of success in low-level cae 
clerical work is usually more efficiently 4 a 
complished by tests based on what en 
to be speed of perception than by tests ? 


Table 1 
Predicting “Speed of Work” from Test Scores 
N = 24 È 
Best ignifi- 
Time, Cuttin signs 
n g Correct + 
Wi Test Min. r Score Selection! oe 2 
onderlic Personnel 12 No 
. 04 z 
SRA Clerical: Vocabulary 5 08 ra x9 a F No 
atini 15 —.05 13 15 a 23 No 
‘oding 5 sc z .05 
Hay: Number Perception 4 Bo Ki 18 of 23 10 
Name Finding 4 ‘60 D 1Sinf 26 10 
Number Series 4 04 18 of 24 10 
LOMA No. 2A2 35 a = Sit 2A 10 
Hay: Number Perception ` a 150R l 
+ Name Finding j 67 21 of 24 a 
SRA Clerical: Vocab. 29 
Arith, 25 1558 13 18 of 23 .05 
Coding 72 2 
1 Chance would give 


* No correction has by 
Range in sample however 
3 Multiple R. 


was as great as for other tests given, j 


judging by 


Comparative Validities in Clerical Testing 301 


volving primarily reasoning problems (1, 2, 
3, 4, 5, 6, 7, 8). 

It is worthy of note that the most efficient 
tests were also the briefest: Number Per- 
ception, Name Finding and SRA Checking. 
This points to the wasted effort of giving a 
large battery of tests, tests with long time 
requirements or an omnibus test, where some 
Material may be only dead wood and may 
even reduce the efficiency of the whole test. 
Time is not very important in school situa- 
tions but in industry it is critical, both for 
Maintaining good public relations and in re- 
ducing the direct costs of testing. 

Warning has already been given against 
Placing complete reliance on product-moment 
Coefficients of correlation, on the ground that 
if regression is not rectilinear the coefficient 
May thereby be lower than would be ex- 
pected. Table 1 affords an example. Num- 
ber Series correctly selects 18 out of 24 cases, 
the same figure achieved by three other 
tests and nearly as high as a fourth; yet the 
"Is only .04, whereas the others are between 


54 and .64. An examination of the scatter- 
diagrams provides the explanation: regression 
follows a U-shaped course for Number Series 
but is almost perfectly rectilinear for the other 
four tests. 


Received May 28, 1954. 


n 


. Howe, 


Early publication. 


References 


. American Bankers Association, New York. Cleri- 


cal Testing in Banks. 1952. 


. Blakemore, Arline. Reducing typing costs with 


aptitude tests. Personnel J., 1951, 30, 20-24. 


. Hay, E. N. Predicting success in machine book- 


keeping. J. appl. Psychol., 1943, 27, 483-493. 


. Hay, E. N. Cross-validation of clerical aptitude 


tests. J. appl. Psychol., 1950, 34, 153-158. 


. Hay, E. N. Test scores and ratings of clerks at 


the Roane-Anderson Co. Unpublished, 1950. 


. Hay, E. N. Mental ability tests in clerical selec- 


tion. J. appl. Psychol, 1951, 35, 250. 
D. W. Summary of test validation studies 
at the Hanover Bank. Unpublished, 1950. 


. Miller, R. B. Reducing the time required for 


testing clerical applicants. Personnel J., 1950, 
28, 364-366. 


OURNAL OF APPLIED PsycHOLocy 
van $ No. 5, 1954 


A Sales Comprehension Test * 


Martin M. Bruce 


Dunlap and Associates, Inc., Stamford, Conn. 


One of the areas in industry where testing 
has been carried on extensively is sales. 
However, very few instruments dealing with 
selling have been published and are available 
for general distribution. 

The reader will find an excellent review of 
the literature on selection of sales personnel 
in Husband’s article (3). Since that pub- 
lication Rock (4) has published an article 
on his Sales Situations Test. Rock reported 
on just two small sales groups in describing 
the test, one consisting of 25 subjects, the 
other 31 subjects. The instrument attempts 
to present “live” situations, with items in 
multiple choice format. The idea has long 
appeared to the writer to be a sound one. 


Because of the apparent dearth of testing 
material in a field where a great many men 
are tested, the writer set about in 1946 to 
devise a test that would aid in measuring 
potentiality for success in selling. 


Problem 


The problem was one of constructing a test 
that would aid in predicting success in sell- 
ing. Specifically, the test was to be one that 


would be directly applicable to the whole- 
sale sales field in general. 


Procedure 


In 1946 an experimental form of the test was 
prepared in mimeograph format. This instru- 
ment contained 74 items. The items were con- 
structed with the aid of salesmen in various 
fields, business men in occupations related to 
selling, industrial psychologists, and literature on 
selling. 

The 74 items were administered to salesmen in 
various fields throughout the country as well as 
to individuals in occupations other than sales. 
The 50 items that differentiated best between 
the sales and non-sales groups were retained and 
published as the Aptitudes Associates Test of 
Sales Aptitude (Principles of Selling) Form A 
(2): 


* The Sales Comprehension Test, Form M by Mar- 
tin M. Bruce is obtainable from the author at 71 
Hanson Lane, New Rochelle, New York. 


302 


Additional data on 1,404 cases were collecte 
on the 50-item form. These cases consiste n- 
1,007 non-salesmen and 397 salesmen. ue all 
salesmen consisted of individuals applying ae in 
types of jobs other than sales with dae i 
the East and Midwest, students studying psy¢ vo- 
ogy in New York and New Jersey colleges, nd 
cational guidance clients in New York City he 
men in various non-sales jobs throughout ee 
country. The sales group consisted of 55 sa in 
men of major and small electrical appliances $ 
cities in Ohio and Connecticut; 86 salesmen i in 
sales managers of electronics products locate the 
practically all common distribution centers in les- 
United States; 19 metropolitan New York aed 
men of office dictating equipment; 13 sales 
of hardware products located in Southern ta 
Midwest locations; and 224 other indivi A 
salesmen in a wide variety of fields located in in- 
sections of the country. This last group fice 
cluded salesmen in the following fields: baa 
Supplies, whiskey, beer, soap, razor blades, stiles 
tain pens, automatic pencils, clothing, teru 
furniture, dairy products, advertising spa é 
pharmaceuticals, books, materials handling p" hi 
ucts, machinery, and a number of others. on 
coefficients were computed for the 50 ster h 
the basis of the above samples. The 30 hers 
items were retained and published as the S@ 
Comprehension Test, Form M. d- 

A cross-validation study was conducted by nal 
ministering the 30-item form to 661 ate 
non-salesmen and 334 salesmen. The nona: 
men were in 22 different states and filled 21 
ferent jobs. The salesmen were employed iere 
different states and were employed in 11 a 
ent sales fields, ted 

An additional validity study was conduc i 
with a group of 82 sales managers employ 
throughout the United States by a door-to- a 
cosmetics sales firm, These sales managers mati 
Pervise a group of full and part time saleswom 
who sell on a commission basis. 


Results 


Validity, Computations for the sales and 
non-sales groups Containing the original o 
and 1,007 cases, respectively, yielded a : ss 
13.1. This finding suggests that there is Ie 

than one chance in 100 that the means s 
these samples are not significantly differen 
However, this measure of difference is oi 
ously high since it is based on the sa™ 


A Sales Comprehension Test 


Population from which the Phi coefficients 
were computed. The means were 30.8 for the 
sales population and 11.1 for the non-sales 
Population. The SD’s were, respectively, 13.8 
and 18.7. 

The cross-validation populations of 661 
Non-salesmen and 334 salesmen yielded a t of 
Sa This statistic brings us beyond the 1% 
evel of confidence in assuming that the sales 
and non-sales populations are not similar in 
their responses on this test. This is an in- 
dication of the test’s status validity (5). 
Pverlapping amounts to 19% in these popu- 
ations, this percentage of the non-sales group 
equalling or exceeding the median of the sales 
i In this cross-validation population 

e means of the sales and non-sales groups 
Were, respectively, 28.9 and 12.2; the sigmas 
Were 12.2 and 16.9; the medians 29 and 12. 

In a study conducted with the sales force 
of a nation-wide electronics sales firm six 
tests were completed by the 86 salesmen and 
Sales managers, These included personality 
Mventories, mental ability and other ability 
tests and an interest inventory. The Sales 
Comprehension Test correlated higher with 
ai rating criterion than any of the other 
= tests employed in the battery. The 7 was 
è The criterion has an uncorrected odd- 
Ven reliability of .92. 

In this group the mean scores for the 77 
Salesmen and 9 sales managers were com- 
Pared by computing t. The t of 2.4 is sig- 
nificant at the 2% level. There is a 31% 
overlap here, using the same overlap measure 
F above. Assuming that sales managers = 
s ‘le have better sales comprehension than 
alesmen, the indication that the Sales Com- 
Prehension Test measures this difference 
urther suggests validity for the test. ; 

cores on this test were correlated with 
og grades of 27 students studying sales- 
pre Ship at Rutgers University. The 5 

Oved to be .68, suggesting that this tes 

tes comprehension similar to that gained 
Students studying salesmanship school. 

on relation with Intelligence. It is a a 
Dositn oe nteh finding that abilities tend to : 
fi ely correlated. A particularly ne 
sity ng is that tests employed in the ah 

ation tend to correlate positively wit 


303 


each other and especially with tests of in- 
telligence (1). Statistical analysis usually 
reveals that various paper and pencil tests 
actually measure to a significant extent what 
intelligence tests measure. Therefore, it is 
important to know the extent to which this 
test is related to measures of intelligence. 

A correlation was run between the total 
score on the Sales Comprehension Test and 
the total score on the Otis Self-Administering 
Test of Mental Ability, Higher Examination: 
Form A. The correlation based on a sample 
of 387 men, women and salesmen was — .19. 
This group was composed of college psychol- 
ogy students studying testing, job applicants 
and vocational guidance clients. In this 
group the standard deviation of Otis raw 
scores was 9.7 and the standard deviation of 
Sales Comprehension Test raw scores was 
18.7. The means were, respectively, 56.7 and 
19.8. 

Further research was conducted with the 
aid of Thurstone’s Primary Mental Abilities 
Test which contains five factors. The 173 
subjects include 159 men and 14 women. 
All but four of the men and two of the 
women were evaluated for clerical, sales, 
managerial or engineering positions with vari- 
ous firms in the East. 

The findings appear in Table 1. 

The Sales Comprehension Test score mean 
and standard deviation were, respectively, 
17.2 and 16.9. 

The fact that all of these correlations are 
close to zero and since the correlation with 
the Otis is low and negative, it appears justi- 
fied to state that measures of various intelli- 


Table 1 


Relationships Between Sales Comprehension 
Test and PMA Test 


r with 

Sales 

PMA Factor N Mean Sigma Test 
Verbal Meaning 173 40.4 7.8 .06 
Space 173 24.6 9.5 —.20 
Reasoning 173 17.2 49 —.05 
Number 170 46.7 11.5 = 08 
Word Fluency 170 37.0 14.2 02 
Total Score 170 224.9 47.8 —.05 


304 


gence factors and the Sales Comprehension 
Test are not related. The Sales Comprehen- 
sion Test appears to measure something other 
than intelligence. 

Correlation with Persuasive Preference. 
There appears to be a positive linear rela- 
tionship between persuasive preference as 
measured by the Kuder Preference Record, 
Form CH and performance on the Sales Com- 
prehension Test. Data on these two tests 
were obtained for 146 non-salesmen and 54 
salesmen. The r proved to be .39, significant 
at the 1% level. The standard deviation of 
persuasive preference scores was 15.8 while 
the standard deviation of Sales Comprehen- 
sion Test scores was 17.1. 
means were 37.7 and 17.8. 

The modest but positive and significant r 
between sales score and persuasive score is 
in keeping with the concept that people tend 
to learn in areas in which they are interested. 

Reliability, Reliability data in the form 
of tests and retests were obtained from 103 
college students. Scores ranged from — 36 
to 47. The mean of the first testing was 10.4 
and for the second it was 11.1. The standard 
deviations were, respectively, 15.8 and 14.9, 
The test-retest reliability coefficient for this 
group was .71. Because this is a restricted 
group with respect to range of scores, it 
seems likely that the true test-retest relia- 
bility coefficient for the entire population, in- 
cluding salesmen, is somewhat higher. The r 
is .79 when corrected for homogeneity. 


The respective 


Summary 


An experimental form of a test to aid in 
selecting and evaluating salesmen was pre- 
pared in 1946. Preliminary validity data led 
to the elimination of 24 of the 74 


multiple 
choice items. 


Over a period of five years 
data were collected on the 50-item form. 


Martin M. Bruce 


Data on 1,398 cases indicated that aire 
were 30 items that significantly and reliably 
differentiated salesmen from non-salesmen. 
These items have been combined to form m 
Sales Comprehension Test, Form M. T 
test was cross validated on a supplementary 

opulation. 

p This instrument proved to be the goa 
valuable in predicting success among S4 tos 
men and sales managers in a national 525? 
organization. The test correlated SE 
cantly with final grades in a class in vane 
principles. The instrument, unlike © aa 
paper and pencil tests, does not aman 
telligence to any extent. People who w d 
high preference for persuasive activities të “ 
to do better on this test. The test gore 
capable of differentiating good from pO 

sales personnel. . 

A TE pate reliability coefficient, 79 igh 
rected for homogeneity, is sufficiently in 
for group situations to warrant confidence 
its consistency of measurement. M, 

The Sales Comprehension Test, Form “"* 
appears to be an instrument that can ii 
utilized in sales selection and egali i 
situations. Its validated item content & 
lends itself to sales training situations. 


Received November ó, 1953. 


References 


raanei A 
1. Bruce, M. M. The prediction of effectiveness z, 
a factory foreman. Psychol. Monogr» 
67, No. 12 (Whole No. 362). Heat: 
- Buros, O. K. Fourth Mental Measurements 1953. 
book. Highland Park: Gryphon Press, selec- 
- Husband, R. W, Techniques of salesmen 


129- 
tion. Educ. psychol, Measmt., 1949, 9 
148. 


Psy- 

- Rock, M. L. A sales situation test. J. appl- j 
chol., 1951, 35 331-332. ; tests 

+ Technical recommendations for psychological ne 
and diagnostic techniques: preliminary ds. 
Posal. APA Committee on Test Star 
Amer. Psychologist, 1952, 7, 461—415. 


rS 


uw 


Y 


Tue JOURNAL OF APP POA = 
E nioa PsvenoLocy 


A New Method for Obtaining Weighted Composites of Ratings 


H. F. Dingman 


and J. P. Guilford 


University of Southern California 


In spite of the many weaknesses of ratings 
of personnel obtained in the practical situa- 
tion, they still often remain the only criterion 
against which to validate predictive meas- 
a It is therefore important that we cor- 

or weaknesses wherever we can in order 
to achieve the best information obtainable con- 
cerning the validity of selection instruments. 

It has been amply demonstrated that in 
order to obtain increased reliability, and 

ence also probably increased validity, of 
singe ratings, it pays to combine ratings 
rom several raters. One of the common dif- 
ficulties in this connection, however, is that 
no rater is acquainted with all the ratees in 
experimental group. At best, not all 
raters are equally well informed concerning 
all ratees, Tt is also true, even when raters 
Now ratees fairly well, that each rater uses 
ferent information and rates on different 
Aualities, Under such conditions not all rat- 
ngs should be given equal weight in forming 
Composites, This report is concerned with 
eh development of a method of weighting ob- 
— ratings in terms of two rater character- 
ics. One is the rater’s tendency to rate on 
qualities in common with other raters and the 
ere is the rater’s degree of confidence in his 
ating of particular individuals. , 
ton he problem of weighting ratings arose in 
nection with a project on the ae 
i new testing instrument designed for the 
election of personnel who come under the 
general category of Psychiatric Technicians 
ard Aides) serving in a state institution.* 
i total of 716 such personnel in the same 
“stitution were under study. Each one had 
cen rated by four different supervisors who 
thei been in positions favorable for observing 
gi it performances. A graphic rating was 
ven on a line seven centimeters long under 
ss Mstruction to rate for general effective- 
in S on the job. Each rater also gave @ rat- 

8 on a similar line indicating his ow? degree 
sej, This za ject on the 
Selection ‘of Bats done ae Patang, supported BY a 


Sra; z 
Withe from the U. S. Public Health Service in contract 
the Pacific State Hospital, Spadra, California. 


of assurance that his rating of effectiveness 
was correct.2 Some of these ratings would 
be zero or near zero where the raters felt 
that they had little or no basis for making the 
rating of effectiveness. 


Intercorrelations 


Before adopting any system of weighting 
the ratings to form a composite criterion 
measure, we decided to obtain as much in- 
formation as possible concerning the proper- 
ties of the ratings. This was accomplished 
through intercorrelations of raters and fac- 
tor analyses of both the effectiveness and the 
assurance ratings. 


Table 1 
Intercorrelations of Effectiveness Ratings 
(N = 716) 
Rater 
ee 
A B C D 
A 54 .16 53 
B 54 AL AS 
C 16 Asi .08 
D 53 45 .08 


Table 1 shows the intercorrelations among 
the four raters, using the effectiveness ratings 
of all 716 employees. It is obvious that raters 
A, B, and D show about the same level of 
inter-rater agreement on ratings of effective- 
ness, While rater C shows little agreement 
with any of those three. 

The factor analysis of the correlation ma- 
trix was carried out by the centroid method, 
with iterative solutions until communalities 
were stabilized. The results appear in Table 
2. Here it is seen, first, that one common 
factor is sufficient to account for the inter- 
correlations. In can also be seen that rater 
A has definitely the highest communality. 
This is significant in view of the fact that A 
was a supervisor who makes the major de- 
cisions concerning work assignments and in- 

2 i aini atings of assurance 
eee Anna. Shotwell ‘at the Pacific State 
Hospital staff. 


305 


306 


Table 2 


Loadings in the Single Common Factor in the Four 
Raters’ Ratings of Effectiveness 


H. F. Dingman and J. P. Guilford 


Table 4 


ji t 
Rotated Factor Loadings of the Raters with Respec 
to Their Ratings of Assurance 


Factor Communality 
Rater Loading h? 
A 83 68 
B -68 46 
C 16 03 
D -64 Al 


ter-ward transfers. Rater C had very little 
in common with the other raters. Whether 
this means that C did not know essentially 
the same employees as the others or rated 
them on different qualities we cannot tell 
from this information alone. Taken at its 
face value, we might well conclude that C’s 
ratings should receive less weight in a com- 
posite, if they were used at all. 

The intercorrelations of assurance ratings 
are shown in Table 3. Since assurance may 
be assumed to be highly correlated with the 
degree of acquaintance between rater and 
ratee, we may conclude from Table 3 that 
raters B and C had the least in common with 
respect to ratees whom they knew or did not 
know. Rater A, who had the greatest com- 
munality in her ratings of effectiveness, knew 
more of the ratees in common with D than 
with B and C. The factor analysis gave a 
structure with two common factors. 

Taking this information together with that 
from the analysis of the effectiven 
we conclude that C’s lack of communality 
with the other raters was not due to the fact 
that he knew different employees. C dis- 
agreed with the other raters generally as to 
relative effectiveness of employees that w 


ess ratings, 


ere 
Table 3 


Intercorrelations of Ratings of Assurance Connected 
with the Effectiveness Ratings 


Moan Communality 
Rater I II j 
A 31 .60 a 
B 06 33 1 
C 72 .00 ie 
D 64 49 66 


rated in common. This disagreement (Pe 
mean that C emphasized different sd 
or it could mean that he rated the sa of 
qualities but made different count 
employees with respect to those quali <0 
One might conclude that C’s ratings oh 
inconsistent with those of the consensus ite 
they should not be included in a penne 
On the other hand, perhaps C had somi oe! 
lected valid qualities or some better ev ee 
tions to contribute. The best solution se are 
to be to include C’s ratings for what they a 
apparently worth, that is, to give them 
relatively low weight. 


The Weighting System t 

The weighting system we propose a 
we have used in connection with each 0 un 
Psychiatric Technicians takes into oe 0 
two variables. One is the factor inate in 
each rater in the single common facio rat- 
the effectiveness ratings. Each rater’s this 
ings, regardless of ratee, is multiplied by rat- 
weight. The other weight is the rater $ tee. 
ing of assurance that he applied to each ate 
The over-all weight to be applied to aa 
effectiveness rating is therefore a pas an 
these two values. The composite rating fo ec- 
employee is a weighted mean of the four ae 
tiveness ratings given him by the four 14 
In order to state more explicitly how 


i the 
(N = 716) weighted mean is computed we define 
following symbols: 
a a Let : 
A B (8 D . EE 
23 22 50 Xir = rating of effectiveness of individu 
A “ ~ . 
given by rater K kes 
B 23 04 2 aan 5 ter K mak 
G 22 04 46 Air = rating of assurance that ra divid- 
D 50 .21 46 


concerning his rating Xix of 3 
ual I, 


F 


Obtaining Weighted Composites of Ratings 


F, = general-factor loading of rater K in 
his effectiveness ratings,’ 
and 


X, = weighted mean of effectiveness rating 
for individual I. 


The equation reads 


= DA nk Xx 
y= ro A 1 
DVAnk: W 


d The summations in both numerator and 
enominator are over all raters. 


Reliability of the Composites 


_ In order to determine whether the weight- 
Mg system leads to improvement over a 
foe summation or average of ratings, we 
ave made a reliability study of composites 
derived with and without weights. Relia- 
bility is defined here as inter-rater consistency 
or inter-composite consistency. It is im- 
Possible to estimate the reliability of com- 
Posites of all four ratings, but it is possible 
to estimate reliabilities for composites of two 
taters at a time. Consequently, the raters 
Were combined in all possible pairs of two 
and for each pair a weighted and an un- 
Weighted composite were computed for each 
C The three possible intercorrelations of 
an composites, weighted and unweighted, 
R given in Table 5, based upon 50 ran- 
a mly selected ratees. In every case, the 
Weighted composites show 4 higher inter- 
Correlation; in two cases very much higher. 
e have not applied the Spearman-Brown 
Ormula to estimate the reliability of com- 


Posites of four raters for the reason that the 
formula are 


t ‘ P 
a the weights, we have a cri 
ciently dependable for use 1” 
3 
a RD 
in ere should be more than on 
E an analysis, the investigator has at least twa 
ya One would be to use the first cen ri 
š Oadings. This would be pre 
Would ae particularly weak. The © 
Totatign © t0 use the loadings from each 
e Py Separately as a set of welg, 
tor. 3, Criterion measure correspondin 
Ever, t nless these weights were very , 
> the two criteria would be highly correlated. 


a valida- 


307 


Table 5 


Correlations Between Unweighted and Weighted Com- 
posites of Ratings of Effectiveness Assigned 
by all Possible Pairs of Raters * 


Pairs of Unweighted Weighted 
Raters Composites Composites 
AB vs. CD —.04 54 
AC vs. BD 58 64 
AD vs. BC AS 54 


* From a random sample of 50 ratees. 


tion study. In view of the two very low 
correlations for the unweighted composites, 
there is some question as to whether an un- 
weighted composite of four ratings would be 
sufficiently dependable to serve as a criterion. 


Summary 


This article faces two problems: (1) the 
fact that different raters in a practical situa- 
tion do not know employees equally well and 
thus cannot rate them with equal assurance; 
and (2) the fact that raters differ with re- 
spect to how well they reflect the consensus 
of the group of raters. The ratings of ef- 
fectiveness of 716 hospital employees given 
by four supervisors were studied by factor 
analysis to determine what their consensus 
indicated. One common factor, in which raters 
had quite different factor loadings, was suf- 
ficient to account for the intercorrelations of 
effectiveness ratings. In rating each em- 
ployee, each rater also gave a rating of de- 
gree of his assurance of his correctness. 

A factor analysis of these assurance rat- 
ings gave two common factors, which were 
taken to indicate communalities of acquaint- 
ance with the employees. The results of the 
two factor analyses led to the inclusion of the 
ratings of all raters and to the use of weights 
in forming composite ratings. One weight 
was the factor loading of the rater obtained 
from intercorrelations of the effectiveness 
ratings. The other weight was the rating of 
degree of assurance. The composite was a 
weighted mean of the four ratings of each 
employee. It was demonstrated that weighted 
composite ratings based on this principle were 
definitely more reliable than corresponding 


unweighted composites. 


Received October 11, 1953. 


Tue JOURNAL or APPLIED PsycHorocy 
Vol. 3 No. 5, 1954 


A Technique for Keying Items of an Inventory to be Added to an 
Existing Test Battery 


Charles O. Neidt 


University of Nebraska 


and 
John P. Malloy 


Marquette University 


In developing a quantitative scoring pro- 
cedure and in selecting items for a test which 
elicits item responses that cannot be readily 
classified as correct or incorrect, test con- 
structors have customarily used one or a 
combination of three procedures. The first 
procedure is that of having authorities or 
“juries” select the item response which they 
believe parallels a definition of the behavior 
being evaluated. The resulting individual 
item validity and total test validity are thus 
dependent upon the judges’ interpretation of 
the defined behavior. The second procedure 
is that of assigning larger values to those re- 
sponses internally consistent with the total 
score. The validity of items keyed and se- 
lected according to this procedure depends 
upon the validity of the total score. The 
third procedure involves constructing a key 
and selecting items after correlating each 
Possible item response with an external cri- 
terion, usually some behavior display or rat- 
ing of the subjects. Insofar as subsequent 
Prediction of behavior external to the test 
Score is concerned, the external criterion tech- 
nique contains inherent advantages. If the 
criterion against which the items have been 
validated is a heterogeneous criterion, how- 
ever, the test will also tend to be hetero- 
geneous. Item selection techniques proposed 
by Horst (6), Gulliksen (5), Davis (1), and 
French (4), which combine elements of the 
second and third procedures, tend to reduce 
the heterogeneity of the test. 

When a battery of tests is used for the 
prediction of a criterion, maximum predictive 
effectiveness will occur when each test in the 
battery has a high correlation with the cri- 
terion and a low intercorrelation with the 
other tests in the battery. Thus if a new 


test is to be combined with previously are 
able tests for the prediction of some peace 
then the items in the new test should eo 
ure some part of the criterion not alrea of 
being measured. When individual items ia 
a new instrument are validated against S- 
criterion, the test constructor is usualy i 
sured of some subsequent predictive € re- 
tiveness when the test is used singly for P er 
diction. If a test validated in such a ae 
is added to a battery, however, the test ¢ il 
structor has no assurance that the test ie 
increase the total predictive effectiveness S> 
the battery. The reason for this lack of A 
surance may be that the extent to which he 
items in the new test intercorrelate with ee 
other tests in the battery has not been ait 
sidered. The desirability of a techniq n 
which takes into consideration that haere 
in the criterion already associated with ot 
Prediction variables is readily apparent. dee 
It was the purpose of this study to °°” 
termine the relative effectiveness of: (1) a 
ing the items of a new inventory to be ad ya 
to a test battery in terms of their correlate” 
with the total variation of an external g 
terion; and (2) keying the same ienis. n 
terms of their correlation with the criteri® 


variation unexplained by other tests in % 
battery, 


The Techniques 

The External Criterion 
cause the ex: 
keying item 
personality, 
ments h 


Technique. y 
ternal criterion technique st, 
responses to attitude, aries 
and biographical data ins as 
as been widely used for the Le 
twenty years, specific instances of its aPP this 
tion will not be cited here. Essentially t a- 
technique consists of obtaining the corr? 


308 


> 


Technique jor Keying Items of Inventory 309 


tion between each item response of a key 
group and a criterion. If a test contains 20 
items each having four possible responses, 
then 80 correlations are obtained. The key 
is constructed by assigning quantitative values 
to subsequent responses according to the size 
and/or direction of the correlations. With 
oe a procedure more than one response can 
ei for each item. The total score for 
a a administered tests is then ob- 
the. by summing the values assigned to 
ies Sep responses of each subject according 
eaten key. The desirability of checking the 
a ity of the total score for members of an- 
er sample, independent of the key group, 
should be obvious. 
ae The Deviata Technique. The procedure 
E teying item responses according to their 
orrelation with the unexplained criterion 
variation, here referred to as the deviate tech- 
ee is much less well known than the ex- 
ernal criterion technique. Instances in which 
the deviate technique has been used include 
Mia research of Neidt and Merrill (9) and 
rae and Edmison (10). In an article pub- 
E in 1951, Meyers and Schultz (8) de- 
on ed a modified version of this technique, 
in an article appearing in 1953, Schultz 
and Green (11) reported the use of the devi- 
ng technique in a way similar to that used 
the present study. 
In constructing a key with the use of the 
€viate technique, the responses of a key 
8toup to each item are correlated with that 
a of the criterion variation which is not 
Ssociated with other test scores in 4 bat- 
ery. In the analysis of regression of a test 
attery and a criterion, the criterion variance 
Nexplained by other tests can be expressed 


or 
any group as follows: 


Sy? — [aisy t + 


Whe 
re 5 : ‘ a fè 
Bint, Sy? is the criterion sum of squares, 


the i are the sums of the cross products of 
lc ee scores and the criterion 10 devia = 
ermi and the a’s are regression weights p 
ae ed by least squares. The foregoing - : 
Ste, can be readily changed to raw i 
Which For any individual in the group a 

h the regression weights have been de 


u 


4 Say 3m] 


termined, an indication of the unexplained 
variation may be obtained from Y — ¥, in 
which Y is the actually obtained criterion 
measure, and Y is the criterion measure pre- 
dicted for this individual from scores in the 
test battery. After prediction and subtrac- 
tion from the actual criterion measures have 
been made for each individual in a key group, 
a distribution can be formed which represents 
that variation in the criterion that is unac- 
counted for by the tests in the battery. This 
distribution will be distributed around zero 
and its shape, although influenced by the 
shape of the criterion distribution, will tend 
toward normality. It is this distribution of 
actual-minus-predicted criterion measures with 
which item responses are correlated in the 
use of the deviate technique. 


Procedure 


Collection of Data. A 201-item life experi- 
ence and attitude toward education inventory, 
constructed by Malloy (7), was administered to 
309 freshman women entering the University of 
Nebraska in September, 1952. Of the 201 items 
in the inventory, 112 were of the multiple-choice 
type and 89 were of the paired-statement type. 
The items were designed to reflect experiences 
and attitudes in four areas, viz., school experi- 
ences and attitudes toward education, self ap- 
praisal, family relationships, and choice of friends. 

The 309 students were subdivided into two in- 
dependently drawn random subsamples of 155 
and 154 students each. The sample containing 
155 students was designated as the key group 
and the sample of 154 was designated as the 
cross validation group. 

Since the inventory was constructed to be used 
with a battery of two other preregistration tests. 
scores on these tests were obtained for both 
groups. The two preregistration tests involved 
were the American Council on Education Psy- 
chological Examination, Linguistic subtest, and a 
local English achievement test, entitled the Eng- 
lish Placement test. Raw scores are customarily 
converted to a one-to-nine scale at the Univer- 
sity of Nebraska and these converted scores were 
used in this investigation. — 

The criterion used in this study was first-se- 
mester average course 
also reported on a one- 


mark. Course marks are 
to-nine scale at the Uni- 
versity of Nebraska, nine being the highest mark 
ig signifying failure. Weighted averages 
for the students were obtained according to the 
dit involved for individual courses. 

this study, other than scores 
luded: first-semester average 


310 


Table 1 


Weights Assigned to the Item Correlations 
for Each Key 


Correlation Weight 
0.25 or higher +2 
0.10 to 0.24 +1 

—0.09 to 0.09 0 
—0.10 to 0.24 —1 
—0.25 or higher =2 


course marks, ACE-L scores, and English Place- 
ment scores for two groups of 155 and 154 stu- 
dents each. 

Development of the Keys. To develop the 
two keys for the inventory, two separate analy- 
ses were made of each item response. In con- 
structing the key according to the external cri- 
terion technique, the correlation between each 
item response and the criterion was estimated 
with the use of Flanagan’s correlation table (3). 
In constructing the key according to the deviate 
technique the correlation between each item re- 
sponse and the distribution of actual-minus-pre- 
dicted course marks was obtained in the same 
manner. The regression equation used in ob- 


taining the actual-minus-predicted distribution 
was 


Ê = .197 X,+ .195 X,+ 4.125 


where X, is En 
form and X, is 

After the two 
been obtained, tl 
assigning weigh 
ing to the size 


Charles O. Neidt and John P. Malloy 


only the upper and lower 27 per cent of the F 
terion distribution, the significance from zero % 
the estimated correlation coefficients was no 
ascertained. 5 

The degree of similarity between the weights 
assigned to each item response for the two keye 
may be seen from Table 2. Each of the a 
items of the inventory contained from two : 
five response choices which yielded the tolalla 
629. The coefficient of correlation between ie 
two distributions of response weights shown ! 
Table 2 is 509. The deviate technique key a 
tained 368 item response weights other than aie 
as compared with 339 such response weights 10 
the external criterion technique key. It show 
be recalled that in responding to the inventory 
however, each subject gave 201 responses, rathe 
than 629, R 
The number of items having response weights 
of zero for all choices within the item was foun 
to be 53 for the external criterion technique Ea 
and 42 for the deviate technique key. Of Si 
items having all response weights of zero, 
such items appeared in both keys. In summary, 
eleven more items in the deviate technique key 
than in the external criterion technique key com 
tained one or more response weights other than 
zero. 

The inventories of the 154 students in the 
cross validation group were scored using each 0 
the two keys. To avoid negative scores, the 
constant 50 was added to each of the two in- 
ventory scores for the subjects in the cross vali- 
dation group. The correlations between each 0 
the two inventory Scores and the criterion am 
between these scores and the other test scores 15 
the battery were computed. The significance 0 
the contribution of each independent variable to 
the prediction scheme was ascertained by analy- 


sis of regression. 
Results 
are shown the zero order a 
rrelation between the variables 
It is interesting to note tha 


In Table 3 
efficients of co 
in this study. 


Table 2 
Item Responses Classified by Weight According to Two Keys 
External i i í 
g on Deviate Technique Key Weight 
Key Weight =2 -1 0 +1 +2 Total 
o! 
+2 0 3 7 12 16 38 
+1 1 19 43 49 9 121 
0 10 51 162 55 12 290 
e] 12 57 43 17 0 129 
=2 21 21 6 2 1 51 
Total 44 151 261 135 38 629 


| Technique for Keying Items of Inventory 


Table 3 


Zero Order Coefficients of Correlation Between Each 
Pair of Variables for 154 Cross 
Validation Students 


MX Xs 

A 446 512 332 
a 735 430 478 
33+ 304 

024 


i = Average Course Mark, 

Xı = External Criterion Technique score. 
5 = Deviate Technique score. 

ey = English Placement score. 

A4 = ACE-L score. 


E. 
= 
: 


. hap inventory scores yielded correlations 
he same magnitude with the criterion. It 
ould also be noted that, in general, the 
es technique score correlated lower with 
te other scores in the battery than the ex- 
rnal criterion technique. 
i Mes Table 4 are shown the multiple and 
nae ial correlation coefficients of the combined 
J ‘a es: Inspection of Table 4 indicates 
si the deviate technique score contributed 
Bnificantly to the effectiveness of the total 
heed , whereas the external criterion tech- 
ce score did not. In addition, the optimal 
Mbination of two prediction variables in- 


} Table 4 
Multiple and Partial Correlation Coefficients 
. for Combined Variables 
„Multiple Partial 
l Orrelations Correlations 
T mno = 329 

PCy) =. 055 
| YX) = 084 
t o s =i 

Paa) = 

Faxy = 

¥(Xox,) = 

Foxo = 

Fis 

OAS Average Course Mark. 

ee. External Criterion Technique score- 

x; z Deviate Technique score. 

6 English Placement score. 

Tn, ACE-L score. y 

Canty greats a partial correlation coefficient signi" 

4 ifferent from zero at the 1 per cent level. 
Ednl. 


311 


cludes the English Placement test score and 
the deviate technique score of the inventory. 


Discussion 


The empirical results from this investiga- 
tion indicate that the deviate technique is 
superior to the external criterion technique 
for keying items of a new test to be added to 
an already existing battery. If the key for 
the life experiences inventory used in this 
investigation had been constructed using only 
the external criterion technique, the inventory 
would not have significantly increased the 
predictive effectiveness of the battery. 

The similarity of the zero order correla- 
tions, .446 and .446, between average course 
marks of the cross validation group and the 
two inventory scores is striking. It is doubt- 
ful that such close correspondence will be 
found in subsequent studies of a similar na- 
ture. In general, it seems reasonable to 
postulate that the more homogeneous the 
criterion, the more divergent such coefficients 
of predictive effectiveness for the two tech- 
niques will become, i.e., with a homogeneous 
criterion the external criterion technique cor- 
relation will probably be higher than that 
found for the deviate technique. The simi- 
larity of the two coefficients found in this 
study is perhaps the result of the heterogeneity 
of the criterion. 

The assignment of weights ranging from 
— 2 to + 2 to the item responses of the in- 
ventory imposed a condition of item selection 
on the keying procedures. Some items were 
assigned to weight of zero according to one 
keying technique and weights other than zero 
according to the other technique. Such dif- 
ferences between the weights assigned to the 
item responses will influence the apparent 
length of an instrument and the variability 

scores. Thus in 


among the resulting total 
comparing validity coefficients to evaluate 


two keying techniques, consideration should 
be given to differences between measures of 
central tendency and variability of the total 
score distributions. If the variability of the 
distribution obtained by one keying tech- 
nique is considerably larger than the varia- 
he other distribution, differences 


ility of t 
ee validity coefficients could result 


eal | 


~au Research | 
NG COLLEGE 


312 


which are attributable to differences between 
the total score variabilities rather than to 
actual differences in effectiveness of the tech- 
niques. When the means and standard devia- 
tions for the two total score distributions in- 
volved in this study were computed, the 
means were found to be 55.44 and 54.06 and 
the standard deviations were found to be 
17.59 and 16.02 for the external criterion 
technique and the deviate technique, respec- 
tively. Because these differences are so small, 
it is felt that the greater contribution made 
by the deviate technique key scores to the 
predictive effectiveness of the total battery 
was not attributable to the item selection im- 
posed by the weighting procedure. Ap- 
parently the scored items of the deviate tech- 
nique key contained more similar response 
weights within the items than the scored 
items of the external criterion technique key, 
Such a condition could result in a larger mean 
and standard deviation for the external cri- 
terion technique key total scores, 

The fact that the score on the life ex- 
perience inventory contributed significantly 
to the prediction of average course marks 
suggests the importance of evaluating other 
characteristics of students than scholastic 
aptitude and achievement. A detailed descrip- 
tion of the content, construction, and analy- 
sis of the instrument used as a vehicle for 
this study will be published subsequently, 


Summary 

The purpose of this st 
the relative effectiy 
of an inventory 
existing test bat 


udy was to determine 
eness of keying the items 
to be added to an already 
tery according to: (1) the 
correlation of the item responses with the 
total variation in a criterion (first semester 
average course marks); and (2) the correla- 
tion of the same item responses with the 
criterion variation unexplained by other tests 
in the battery. Two sets of keys were con- 
structed based upon the responses of 155 
subjects. Each inventory of 154 subjects 
constituting a cross validation group was then 
scored using the two keys. The zero order 
correlations between the score derived from 
each key and the criterion were found to be 


Charles O. Neidt and John P. Malloy 


identical for the 154 subjects in the oo 
validation group. When the two scores A 
combined with others in a test battery í 
contribution to the predictive effectiveness | 
the total battery made by the key ore 
from correlating item responses with the a. 
explained variation was found to be rig we 
cant. The contribution made by the “4 
derived from correlating item responses hg } 
the total criterion variation was found to 
not significant. 


Received October 21,1953. 


References 


$ F. 
1. Davis, F. B. Item selection techniques. a 
Lindquist (Ed.), Educational meren anions 
Washington: American Council on Edu 
1951. A du- 
2. Davis, F. B. Item analysis in relation esol. 
cational and Psychological testing. 
Bull., 1952, 49, 97-121. od- 
3. Flanagan, J. C. A table of values of the 3 5 
uct-moment coefficient of DORON HOS ory o 
normal bivariate population correspon York: 
given proportions of successes. New 
Cooperative Test Service, 1936. keying 
4. French, J. W, A technique for ae 1952. 
and selecting test items. Psychometrika, 
17, 101-106, 
5. Gulliksen, H. O, Theory of mental tests- 
York; Wiley, 1950. 


New 


of A 
6. Horst, A. P. Item selection by mean igge, l 
maximizing function, Psychometrika, 
229-244, 


jeve- 

7. Malloy, J. P. The prediction of college ao 
ment with the lije experience a A 
toral Dissertation, University of Ne 
1953. 

» Meyers, R. C. and Schultz, D. f 
academic achievement with the use 0 e. Psy~ 
attitude interest questionnaire, I. Educ: 
chol. Measmt., 1950, 10, 654-663. ificatio” 

9. Neidt, C. O. and Edmison, L. D. Quali meas- 

responses used with paired statements to 


sting 
redicting 
Gi, PAN aw 


Psy” 
ure attitudes toward education. J. educ- 
chol., 1953, 44 305-311. : ffec- 

10. Neidt, C. ©. and Merrill, W. R. Relative ¢ 


. i ps 
tiveness of two types of response to sd 
a scale on attitudes toward educati 
educ. Psychol., 1951, 42, 432-436. 


redictiné 
11. Schultz, D, G, and Green, B. F., Jr. ee 
academic achievement with the use of 3 


sat 
attitude-inte; Educ. P 


rest questionnaire, II. 

chol. Measmt., 1953, 13, 54-64. q cus 

12. Super, D. E. The validity of standard an jot 

tom-built Personality inventories in feast 
selection program, Educ. psychol. X 

1947, 7, 735-744, 


THE JOURNAL or APPLIED PSYCHOLOGY 


Vol. 38, No. 5, 1954 


The Effect of Age and Experience upon Accident Rate 


R. H. Van Zelst 


Kroh-Wagner Co., Riverside, Ilinois 


Industrial accidents, their prediction and 
control and the various factors related to 
and affecting them have long been a subject 
of study for the psychologist in industry- 
One of the specific topics of interest has been 
the relationships existing between the age 
and experience of the worker and his accident 
frequency. 

Most research studies in this area have 
demonstrated the existence of some relation- 
ship between accidents and both experience 
and age. Though by no means universal the 
general conclusion arrived at in these experi- 
ments is that accident frequency tends to 
decline with increasing age and/or experi- 
ence, 


Many of the studies of experience suffer, * 


however, from a procedural error. The most 
common method applied in this type of study 
appears to be to divide the men in a given 
organization into experience groups and then 
to calculate the accident rate of each group. 
The application of this method of necessity 
assumes that if no differences in experience 
exist, all of these different groups would have 
the same average number of accidents. How- 
ever, it is also reasonable to assume that in 
many jobs the high-accident employees will 
tend to drop out either through retirement 
due to injury, separation or voluntarily leav- 
ing employment. Such a natural selection 
Process tends to retain on the job only those 
Persons who have maintained a certain safety 

Standard in their operations. 
The usually discovered decre: 
ent frequency with experience ' 
a a to this natural selection process: Ww = 
ee then appear to be necessary in Or a 
eval the effects of experience may be proper y 
uated is to follow the accident history © 
ne group of workers over # baa ts 
that Several studies (1, 2, 3, 5) have done 
studi Unfortunately, in most instances sige 
itan. either follow the employee’s act er 
y for only a relatively short duration or 


ase in acci- 
may be due 


fail to remove possible influences due to the 
operation of the age variable. 

The study of the relationship between age 
and accident frequency presents a somewhat 
similar picture to the experience problem. 
The typical procedure here again is to sub- 
divide employees into differing age groups 
and to compute the mean number of acci- 
dents for each age group. In most instances, 
however, age is highly correlated with ex- 
perience, thus confusing the issue and making 
it difficult, if not impossible, clearly to ascribe 
any discovered relationship either to age or 
to experience. 

Attempts have been made to minimize the 
effect of the experience variable through the 
utilization of partial correlational methods 
(1, 3). However, these methods are also 
subject to question in that it is not certain 
that experience may be held constant by 
using partial correlation methodology in view 
of the safety selection process previously 
mentioned. It seems probable that the opera- 
tion of these selective factors prevents com- 
pliance with certain basic assumptions in- 
herent in this statistical method. 

It is the author’s purpose therefore to pre- 
sent material obtained in a different manner 
from most of the previous studies in this field 
in an attempt to provide more information 
and gain further insight into the existence of 
the relationships between age and experi- 
ence with accident frequency. 


Subjects 
used in this study are em- 
ployees of a copper plant in Indiana. These 
subjects were selected from six sections com- 
prising & single large department operating 
metal forming mills. Work tasks were iden- 
tical for the members of all groups and no 
unusual differences in pressures of produc- 
tion were observed for the different groups 


during the periods of data collection. 
Conditions of work, light, heat, ventilatior 


The subjects 


313 


314 


were also highly similar for all subjects as 
were the number of hours worked. Only em- 
ployees working on the same shift were used 
in this experiment. 

Conditions and methods of work, together 
with type of equipment, remained virtually 
constant throughout the five-year experi- 
mental period. : 

A total of 1,237 employees who remained 
with the company in the above mentioned de- 
partment for the experimental period had 
their accident records carefully traced and 
charted for each month of the period. In 
addition other members of the work force 


hired at the same time (when the plant was. 


first opened) but who dropped out or were 
separated also had their records carefully 
tabulated and recorded. These workers at 
the onset of the experimental period totaled 
an additional 1,317 workers, 
The number of accidents ex 
each man was readily traced 
ployee history data which contai 
detailed records of dispensary visits and their 
reason and cause. It is felt that this cri- 
terion is valid since it is a compulsory policy 
of the company to have all employees who 
are injured on the job, regardless of how 
slight the injury might be, visit the dis- 
pensary for medical clearance, treatment, and 
report. No distinctions as to severity of 
accident were made in this study. Only 
accidents occurring during working hours and 
in actual performance of the job were used. 
Accident frequency data were reported on 
the basis of mean number of accidents per 
1,000 man hours of operation. Payroll rec- 


ords of the subjects provided the necessary 
data for computation, 


perienced by 
through em- 
ned carefully 


Results 


Figure 1 displays graphically the mean 
number of accidents per 1,000 man hours of 
operation for both of the experimental groups 
and also the entire departmental mean acci- 
dent rate. Accident rate figures are reported 
on a monthly basis for a period of 60 months 
or 5 years. Accident rates for the turnover 
group are not reported after the first 30 
months because of the small number of work- 
ers remaining in that group beyond this 


R. H. Van Zelst 


period of time. (The number of workers S 
the turnover group was reduced to 243 mem 
bers at the end of thirty months.) d 
It can readily be seen from the presentet 
data that in this particular instance the ice 
dent rate for these workers declines apiy 
during the first five months of operation n 
both of the groups. The entire deparia 
mean accident rate closely approximates t 
rate curves of the two experimental pona 
This is readily explained by the fact that a 
two experimental groups, particularly m i 
early phases of the experimental period, com 
prised a majority of the entire work force. * 
The tendency for the departmental ie 
curve to be higher during the latter phases A 
the experimental period can be attributed es 
the incorporation of newly hired employe 
into the work force. ident 
The consistency with which the acci ins 
rate curve of the turnover group payee 
higher than that of their fellow-workers te? e- 
to support the hypothesis that a natural s 
lection process does exist. The higher ie 
and more gradual decline in accident Aa 
quency for this group of turnover employe 
apparently is indicative of an informal ame 
perhaps to some extent a formal weeding oU 
of high-accident workers. he 
In studying these accident rate graphs a 
effect of job experience upon the accident a 4 
of these workers appears to be alana 
for their first five months of employment, the 
Seems to be of little significance beyond a 
fifth month of employment. The can 
leveling off in accident rate after five eae t 
on the job seems to point up the thesis t iy 
experience makes its contribution towards a 
cident rate reduction by familiarizing the mn 
ployee with Proper work and safety habit 3 
Apparently five months of on-the-job dubar 
is sufficient for these workers on this particl 
lar type of operation to become well enoug 
trained to reduce accident rate to what may 
be considered normal expectancy. Tt shou 
5 E re- 
be pointed out that these workers did not } 
ceive the benefit of any formalized per 
assignment training and so actual experie? 


was called upon to substitute for this form4™ 
ized training 


mi A ld 
These initially high accident rates WOY 


Effect of Age and Experience upon Accident Rate 315 


per 1,000 HOURS OF OPERATION 


AVERAGE NUMBER OF ACCIDENTS 


o 10 20 


Fig, 
- 1. The relationship between 
hours of operation for 


a 
oe lend further support to the often 
ing in necessity for proper immediate train- 
abits correct work methodology and safety 
atu to, provide an experimental test of 
2 “ne a the accident rates of another 
een hi kers were charted. These men had 
Pany sat, at various times after the com- 
the Bena, better established and so received 
Proced: efit of formal training in correct job 
also ure and safety methods. These men 
ee ormed the same work tasks under 
or ih similar conditions. Data on this group 
are vs first fifteen months of employment 
graphically presented in Figure 2. 
i here follow the same general pat- 
n N for the previous groups. There 1$ 
RAN identical sharp decline in accident 
Owed b y for the early on-the-job period fol- 
Note ie the same leveling off pattern 
accident. wever, is the fact that the initial 
frequency rate is markedly lower for 


experience on the job and the 
a non-turnover group 


Non-turnover group 
Turnover group 
Department mean 


Ce 60 


average monthly accident rate per 1,000 
and a turnover group. 


Furthermore, the level which 
approximates what has been termed normal 
expectancy for the previous groups is reached 
after the third month of on-the-job perform- 
ance rather than after the fifth. 

In view of the strong similarity between 
the work tasks and work environment of this 
and the other two previous groups, this reduc- 
tion in the frequency of accidents amongst 
these workers for this formative period can in 
the author’s opinion be traced only to the 
benefits derived from the formal training 
program. 

However, the observed decline, still sharp, 
for these trained workers during the early 
phases of their employment still suggests the 


importance of actual accumulated on-the-job 


experience in pringing accident rates down to 


what might be considered normal. 

Still untested is the effect of age upon acci- 
dent frequency. To study this relationship 
two other groups were formed. These groups 


this group. 


316 


e 


per 1,000 HOURS OF OPERATION 
w o 


AVERAGE NUMBER OF ACCIDENTS 


(3 4 B 7 


16 
MONTH 


Fic. 2. The relationship between e 
job and the average monthly 
hours of operation for a grou 


xperience on the 
accident rate per 1,000 
p of trainees, 


were matched on the experience variable. 
Group A was a young group (Mean Age 
= 28.7 years, S.D. = 1.4, N = 639) with ap- 
proximately three years of experience (Mean 
experience = 2.9 years, S.D. = .45), Group 
B was composed of older workers (Mean Age 
= 41.1, S.D. = 2.9, N = 552) also with ap- 
proximately three years of experience (Mean 
experience = 3.2 years, S.D. = .63). 

Accident frequency rate for t 
(Figure 3) differs markedly thr 
eighteen month experimental 
though both groups have the sa: 
experience, the younger group has what ap- 
pears to bea significantly higher accident rate 
than their older work companions. 

As might be expected the younger group’s 
(Group A) accident rate is above the depart- 
ment level while the mean accident rate of 
the older group (Group B) is below the de- 
partment’s level for these particular periods 
of time. 

To further pursue this study of the effects 
of age upon accident frequency rate a third 
group (Group C) was used. These workers 
were similar to Group B in that they too 
were an older group (Mean Age = 39.2, S.D. 


hese groups 
oughout the 
period. Al- 
me amount of 


R. H. Van Zelst 


= 3.1, N = 297), but unlike either of bi 
two previous experimental groups tesh mer 
were inexperienced at the onset of the exp H 
mental period. They did, however, recel ‘ 
the benefit of training prior to actual jo 
assignment and performance. : 

As can be ae from Figure 3, the eee 
frequency rate for this group again as In Oa 
instances shows the same early sharp dec i 
followed by a general leveling off to a a 
tion approximating that of the older gro 
(Group B). The accident rate of Kee a 
follows also the pattern of the previous trai : 
group although mean accident frequency 
somewhat lower throughout the period. j 

It is to be noted that from the third man 
onward and practically from the P 
month onward the accident rate for heir 
group of workers is lower than that ad ien 
younger and much more experienced fe td 
workers (Group A). It is also to be Wa 
that this older group functions belot. be 
mean departmental level after what mig d 
termed the three-month breaking-in perio A 

The greater strength of the relationship oa 
tween age and accident frequency ye 
compared with experience and accident A 
quency rate becomes even more noticeable 4 


ussat 
rose a a T 


per 1,000 HOURS OF OPERATION 


AVERAGE NUMBER OF ACCIDENTS 


10 l4 
MONTH 

Fic. 3. 
employee and ave 


i e 
rage monthly accident rat 
1,000 hours of ope: 


ration, 


f the 
The relationship between the age ° 


Effect of Age and Experience upon Accident Rate 


se experience level differences begin to dis- 
a with the passage of time spent on the 
inh then from these data that 
Olde efinitely related to accident frequency. 
eh in this study even when less 
fae ae maintain better safety records 
of a the younger men. The accident rate 
ieee e younger workers exceeds slightly the 
ites pin ren rate level of the entire depart- 
Their espite the disparity in job experience. 
a Pid in fact, appears from the data to 
Pa ed only by those employees who are 
waa y in the breaking-in stage of de- 
pment. 


Summary and Conclusions 


oe results obtained in this experiment 
oe ah indicate that at least for these groups 
tion th and for this particular type of opera- 
Grane e effect of experience upon the fre- 
toʻa ‘i rate of accidents apparently is limited 
thesjob os to five month period of initial on- 
of tim performance. This particular period 
and ae may be termed a breaking-in period 
Se eg characterized by a sharp decline in the 
ete “3 of accidents. Following this period 
out a a leveling off in accident rate through- 
evel  employee’s work history. This rather 
D may be considered to be normal 
ancy. 

tee the workers are given formal train- 
ean to actual job performance there isa 
nie erable reduction in early accident fre- 
initia rate, which is manifested in lower 
may et frequency and also in what 
Perio T regarded as a faster developmental 
or E In that the amount of time require! 
See work groups to level a at a 
reduced expected frequency 1S significantly 
dine Would appear that age in this instance 
acci ited exerts a greater influence upon 
eta rate than does experience once the 
ban S stage is passed. From the com- 
grou ns made between the mat 
tena it has been found that older workers 
to have fewer accidents than their 


317 


younger co-workers. This appears to be true 
throughout the employee’s work history when 
similar groups are compared. Lower accident 
rates are remarkably characteristic of these 
older men from their earliest job performance 
on. 
Tt is the author’s opinion, although no con- 
clusive evidence is presented, that since age 
exerts the stronger influence upon accident 
frequency rate, beyond initial employment, 
it is necessary to explain accidents in part on 
the basis of immaturity of employees. Fur- 
thermore, the usually found reduction in acci- 
dent rate with increasing age and experience 
can also be attributed to some extent to the 
operation of a natural selection process which 
results in the weeding out of workers less fit 
for the job. It is also felt that little im- 
portance can be attached to the effect of ex- 
perience upon accident rate for periods other 
than that of initial employment particularly 
when the effects of age and the natural selec- 
tion process are eliminated. Proper training 
in correct work methodology and safety 
habits can further reduce the effect of experi- 
ence upon accident rate but cannot apparently 
substitute completely for actual job perform- 
ance in helping the worker to internalize fully 
the correct procedures and habits necessary 
to efficient operation from the safety stand- 


point. 
Received September 28, 1953. 


References 


. E. and Minium, E. W. 
lation to proficiency 
Report to Municipal 
Francisco, 1946. 


1. Brown, C. W., Ghiselli, E. 
Experience and age in re 
of street car motormen. 
Railway System of San 

Chaney, L. W., and Hanna, H. S. Safety move- 
ment in the iron and steel industry. Bur. La- 
bor Statistics, Rept. 234, 1918. 

3. Hewes, A. Study of accident recor 
mill. J. Industrial Hygiene, 1921, 3, 187. 

4. Newbold, E. M. A contribution to the study of 
the human factor in the causation of acci- 
dents. Ind. Fatigue Research Bd., Rept. 34, 
1926. 

5. Vernon, H. M. Prevention of acci 
ind. Med., 1945, 2, 3. 


v 


ds in a textile 


idents. Brit. J. 


¥YCHOLOGY 
THE JOURNAL or APPLIED PSYCH 
Vol. g No. 5, 1954 


Note on Age and Productive Scholarship of a University Faculty 


Robert A. Davis 
George Peabody College for Teachers 


The results presented are a part of a larger 
study conducted for the Council on Research 
and Creative Work of the University of 
Colorado in 1946. The study was designed 
to survey the research and writing activity of 
the entire faculty (representing all the schools 
and colleges) during a twenty-year period, 
1920-1939 inclusive. This is a period that 
we believed would reflect trends between two 
major wars—a relatively stable period in the 
history of the university. 

During the period covered by the study 
faculty members had been requested annually 
to submit to the Dean of the Graduate School 
a list of papers, articles, and books written 
during the year just ended; and these items 
were published annually in the Graduate 
Bulletin. In order to safeguard accuracy the 
author sent each faculty member a list of his 
contributions as recorded in the Graduate 
Bulletin and requested that they be checked. 

The terms research and writing should be 
noted. No effort was made to differentiate 
between items that were definitely of research 
character and those that were scarcely more 
than descriptive or expository documents. 
Also attention is called to the term activity, 
The study did not deal with the difficult prob- 
lem of appraising contributions of faculty 
members. Instead, it was concerned exclu- 
sively with the amount of research and writ- 
ing completed. 

The data reported here concern only one 
aspect of the larger study, that of research 
and writing in relationship to the age of the 
faculty member at the time. During the 
period covered by the study any person con- 
tributing one item was regarded as writing. 
Co-authors were treated in the same manner 
as authors writing independently. In cases 
of multiple authorship each person received 
the same credit that he would have received 


as a single contributor. The curves show 
absolute and not proportionate numbers 7 
contributions. Consequently, they do 
make allowances for the diminishing angie 
of potential contributors at the upper = 
levels. Figure 1, which is based on the Te 
ords of 385 faculty members, tells the ne 
The results suggest a number of re 
How do research and writing activity rela 


Soo, 


—ToTAL 
WRITINGS 


400 -ARTICLES and 


MomograPhs 


Boo 


oo 


{00 


25- 30- 35: 4a- GE 30-57 lo- oe 2 
27? 34 39 48 49 sy 57 47 <¢ 
of 
Fic. 1. Number of items published and 48° 
contributor, 


318 


Note on Age and Productive Scholarship 


to the age at which a faculty member attains 
full professorial status? How do they relate 
to salary increases and promotional policy in 
general? What should be the policy of a 
university administration regarding research 
and writing? What means may be used to 
stimulate research? Other kinds of profes- 
sional growth? If faculty members as a 
group reach a peak in research and writing 


319 


activity around 45 years of age is there evi- 
dence that they continue to grow profession- 
ally in other respects? Is there any funda- 
mental reason for the peak of activity around 
45 years of age? Is this a crucial period in 
the career of a faculty member? The reader 
will think of many other questions. 


Received October 16, 1953. 


THE JOURNAL _OF 


APPLIED PSYCHOLOGY 
Vol. 38, No. 5, 1954 


Relationship of Employee Morale to Ability to Predict 
Responses * 


Rossall J. Johnson 


School of Commerce, Northwestern University 2 


This investigation is concerned with the 
relationship between the morale of an em- 
ployee and the ability of the employee to 
predict the responses of his subordinates and 
the morale of these subordinates. 

There has been some evidence (1, 4, 5) to 
indicate that where individuals “knew” and 
understood one another they were able to 
predict the others’ responses. It would seem 
to follow from this that where a group and 
leader relationship existed, the ability of the 
group members to predict the leaders’ re- 
Sponses would be dependent upon how well 
these group members understood their leader, 
And conversely, the ability of the leader to 
Predict the group members’ responses would 
be dependent upon how well the leader under- 
stood the group members. 

This problem may be clarified by asking 
three questions. 1, Do subordinates with 
high morale “know” and understand their 
Supervisor better than low morale subordi- 
nates? 2. Is the morale of the subordinates 
who “know” and understand their boss higher 
than those who do not “know” and under- 
stand their boss? 3, Does the supervisor 
“know” and understand the high morale sub- 
ordinates better than the low morale sub- 
ordinates? Tf the answers to questions 1 and 
2 are yes, then one May anticipate the de- 
velopment of a questionnaire which will in- 
dicate morale by measuring the ability of the 
subordinate to predict the responses of his 
supervisor. An affirmative answer to question 
3 would indicate that the morale of the sub- 
estimated by measuring the 
ability of the supervisor to predict the re- 
sponses of his subordinates. 

In order to analyze this problem, the fol- 
lowing null hypotheses were set up: 


1 This paper was presented at the MPA annual 
meeting, Columbus, Ohio, April 30, 1954. 
is is part of a doctoral dissertation done un- 


der the direction of Dr. H. H. Remmers of Purdue 
University, 


320 


1. There is no significant difference be- 
tween the morale scores of subordinates hice 
can predict the responses of their supervisor 
best, and the morale scores of soem 
who have the least success in predicting the 
responses of their supervisors. z 

2. There is no significant difference be 
tween the ability of high morale subordinate 
to predict the responses of their oe sane 
and the ability of low morale subordinates 
predict the responses of their er g 

3. There is no significant difference ol 
tween the morale scores of individual e 
ordinates whose responses were most cee 
fully predicted by their supervisors and t js 
morale scores of individual subordinates wer 
responses were least successfully predicted J 
their supervisors. 


Procedure 


A sample of 227 subordinates and 25 spee 
Sors was taken from two companies. The ie y- 
ordinate, for the purpose of this study, is re 
nated as a randomly selected hourly paid Sanity 
who does not have group leader responsibik 3. 
and who has worked for the tested sopera = 
at least nine months. The supervisor is ames 
as a salaried supervisor who has at least 12 Siy 
ordinates (as defined above) reporting girer 
to him. This supervisor should have SURUV g, 
these 12 subordinates at least nine monty 
Eight to 10 subordinates under each of the 
Supervisors Participated in the project. s- 

Fee scores were calculated from the que 
tionnaire: (1) Subordinate morale Score, te 
supervisor Predicting Score; and (3) subordina © 
Predicting score. The subordinate morale sare 
is the number of times, out of a possible 20, tine 
the subordinate selected the most favorable Te 
Sponse. The supervisor predicting score is a 
total number of times the supervisor correc “a 
predicts the subordinate’s response to 20 que 
tions, f 
dinate questionnaire consisted a 
was a selection of 20 a 
of the test How Supervise’ 


Employee Morale and Ability to Predict Responses 


B . 
ors Study (2) these questions had D values (3) 
questions a igher. Part C consisted of the same 
predict r Ea in part A but with instructions to 
is supervi response that the subordinate thought 

The Paaa would give to each question. 

is ice peo was guaranteed anonymity. 
sheet depo it TA data were on a separate 
questionnai, ed in a ballot type of box while the 
was de aire with the supervisor's name only 
naire ord in another box. This question- 
rought ba ee personal data sheet were later 

ae ck together by means of a code. 

Darts, MA ed questionnaire consisted of two 
tions Piety is made up of the same 20 ques- 
used in th orm A of How Supervise? as were 
etvisor an subordinates’ questionnaire. The su- 
eere fe e these questions as he would if 
he questions ering the complete form, except that 
ive was ae mark or undecided response alterna- 
Same 20 paned. Part B also consisted of the 
questions w Supervise? questions and 20 morale 
Supervisor mentioned above. With part B the 
ordinates was given a list of names of the sub- 
naire who filled in the subordinate question- 
list. p masini of ten names was on the 
Such i fo man had a code number assigned, 
Ted Q, J0” Jones—No. 1, Bill Smith—No. 2, 
asked eon No, 3, etc. The supervisor was 
the 40 Predict how each subordinate answered 
man fee ees For example: when the fore- 

question ecided how John Jones had answere 
Predicted he wrote the number “1” after the 
SPonses Beets, He then predicted the re- 
and so i „subordinates Bill Smith, Ted Green, 
dicted th n in the same manner until he had pre- 
all o tk he eight or ten subordinates’ responses to 

N€ questions. 


S Results 
Upervisors predicted the responses of 25% 
“4 subordinates with scores of 14 or 
Upervis This constituted the high group- 
f the 5 rie predicted the responses of 237% 
is Be ordinates with scores of 10 or lower. 
ë nade up the low group. At test was 
differen. determine if there was a significant 
€ hj r between the mean morale score of 
the 1.21 group and the mean morale score of 

ony group. 

there shown in Table 1, the hypothesis that 
Morale. no significant difference between the 
Whose Scores of individual subordinates 
dicteg 7 nes were most successfully pre- 
Coreg Py their supervisors and the morale 
Onse of individual subordinates whose re- 
their S were least successfully predicted by 
Supervisors was not rejected. The t 


o 
t 
highe 


321 


Table 1 


Comparison of Mean Subordinate Morale Scores for 
High-Low Supervisor Predicting Scores 


Supervisor Predicting 
Scores 


High 25% Low 23% 


Mean Subordinate 
Morale Score 

t value 

Significance Level 


12.3 11.8 


a 


value indicates that there evidently is no sig- 
nificant difference between the means of the 


individual morale scores. 
Nineteen per cent of the subordinates had 


scores of 17 or higher on the morale survey 
questions. This group was considered the 
high morale group. Twenty per cent of the 
subordinates had scores of 8 or lower. This 
group constituted the low morale group. The 
subordinate individual predicting scores on 
the 20 questions of form A of How Super- 
vise? for the high morale group were added 
and a mean score obtained. 


The mean score for the low morale group 
was obtained the same way. A t test was 
t differ- 


made to see if there was a significan! 
ence between the mean of the high morale 
group and the mean of the low morale group. 

Table 2 shows that the hypothesis that 
there is no significant difference between the 
ability of high morale subordinates to predict 
the responses of their supervisors and the 
ability of low morale subordinates to predict 
the responses of their supervisors was Te- 
jected. The t value indicates that the high 
morale subordinates’ mean score was sig- 


Table 2 


ordinate Predicting Scores 


Comparison of Mean Sub 
e Morale Groups 


for High-Low Subordinat 


Morale Scores 


High 19% Low 20% 


Mean Subordinate 
Predicting Score 12.6 9.8 
13.8 


t value 
Significance Level 1% 


promene 


322 


nificantly higher than the low morale sub- 
ordinates’. The difference in the means was 
significant at the 1% level. , - 

The mean of the subordinates’ morale 
scores for the subordinates who were most 
successful in predicting their supervisor’s re- 
sponses was tested by the t test method to 
see if it was significantly different from the 
mean of the subordinates’ morale score for 
the subordinates who had the least success in 
predicting their supervisor’s responses. Sub- 
ordinates who predicted the responses of their 
supervisors most successfully were those with 
prediction scores of 14 or higher. This group 
represented the top 24%. Subordinates who 
predicted the responses of their supervisors 
least successfully were those with predicting 
scores of 9 or less. This group constituted 
the bottom 20%. The mean of the subordi- 
nate individual morale score for the subordi- 
nates who were most successful in predict- 
ing their supervisor’s Tesponses was tested by 
the t method to see if it was Significantly dif- 
ferent from the mean of the subordinates? 
individual morale score for the subordinates 
who had the least Success in predicting their 
supervisor’s responses, 

The hypothesis that there is no significant 
difference between the morale scores of sub- 
ordinates who can predict the responses of 
their supervisors best and the morale scores 
of subordinates who have the least success in 
Predicting the responses of their supervisors 
was rejected. Table 3 shows that the mean 
morale score of the subordinates who had 
high individual Predicting scores was sig- 
nificantly higher than the mean morale score 
of low predicting subordinates. This differ- 


Table 3 


Comparison of Mean Subordinate Morale Scores for 
High-Low Subordinate Predicting Scores 


Subordinate Predicting 
Scores 
High 24% Low 20% 
Mean Morale Score 13.4 10.2 
t value 3.64 
Significance Level 1% 


Rossall J. Johnson 


ence as shown by the t value was significant 
at the 1% level. 


Discussion and Conclusions 


Based on these data, the following con- 
clusions may be drawn. thé 
1. It cannot be generalized as to ha 
morale state of the subordinate and t # 
ability of the supervisor to predict his is 
sponse. The supervisor is evidently able ibe 
Predict the responses of some low morale a h 
ordinates with as much skill as some higa 
morale subordinates. The see ct 
hypothesis 1 might be explained by the oe 
that some low morale subordinates es 
their objections or criticisms of certain are 
Because they expressed themselves forcefu = 
the supervisor remembers these acne = 
and is thus in a better position to predict is 
low morale subordinate responses than at 
able to predict the responses of those W 
have average morale. ae 
2. High morale subordinates are better p" 3B 
dictors of their supervisors’ responses than å 
low morale subordinates, +¢ the 
3. The subordinates who could predict her 
responses of their supervisors best had hig he 
morale than those subordinates who had = 
least success in Predicting their superviso™ 
Tesponses, esses 
These last two Conclusions may be ert 
preted as meaning that those who were ee 
acquainted with their supervisor had hig Te 
morale and those who had high mora r 
“knew” and understood their superviso 
better. e 
In connection with conclusion No. 2 the" 


5 
was a possibility that high morale employe® 
were assignin 


to the supe 
employees, 
more with hi 
the low mor. 
their ability 


To investigate this phase, the answel® 
marked by the employee on the How Sv? w 
vise? test were compared with the answa 
the employee predicted his supervisor mae” 
The number of times the answer and P” 


3 e. p_i I - SC 


a 


Employee Morale and Ability to Predict Responses 


diction differed were tallied. It was found 
by the t test method that the average number 
of responses which were different for the high 
morale group were not significantly different 
from the average number of answers which 
were different for the low morale group. 


Summary 


An analysis was made to see if supervisors 
are able to predict the responses of high 
Morale subordinates more successfully than 
those of low morale subordinates. An analy- 
SIS was also made to see if there was a differ- 
€nce in morale scores of those subordinates 
who were able to predict their supervisors’ re- 
SPonses most successfully and those who were 
fast successful in predicting the supervisors’ 
responses, The results indicate that super- 
Visors are not able to predict the responses 
of high morale subordinates with any more 
Success than the responses of low morale sub- 
ordinates, The results also indicate that 


323 


high morale subordinates are able to predict 
their supervisors’ responses better than low 
morale subordinates. 


Received July 2, 1954. 
Early publication. 


References 


1. Dymond, R. F. Personality and empathy. J. 
consult. Psychol., 1950, 14, 343-350. 

2. Harris, F. J. The quantification of an industrial 
employee survey. I. Method. J. appl. Psy- 
chol., 1949, 33, 103-111. 

3. Lawshe, C. H., Jr. A nomograph for estimating 
the validity of test items. J. appl. Psychol, 
1942, 26, 846-849. 

4. Miller, F. G. and Remmers, H. H. Studies in in- 
dustrial empathy. II: Management's attitudes 
toward industrial supervision and their esti- 
mates of labor attitude. Personnel Psychol., 
1950, 3, 33-40. 

. Patton, W. M. A study of certain psychological 
variables related to supervision in the textile 
industry. Unpublished doctor’s dissertation, 
Purdue University, 1951. 


n 


THE JOURNAL or APPLIED PsycHoLocy 
Vol. ae No. 5, 1954 


An Application of Rogerian Concepts to Nurse-Patient 
Relationships 


Lewis Bernstein + 


Veterans Administration Hospital, Denver, Colorado 


From the experience of teaching psychology 
to both student and graduate nurses, it has 
become increasingly apparent that psycholo- 
gists can contribute in an important manner 
to nursing education. This potential con- 
tribution lies in the field of nurse-patient rela- 
tionships, an area in their Preparation which 
the nurses and student nurses themselves find 
incomplete. One nurse put the problem in 
this manner: “From the beginning of our 
training, the idea of caring for the patient, 
rather than the illness, has been emphasized, 
but nowhere in our program do we have the 
opportunity to learn how to put this idea 
into practice.” Others have voiced a more 
specific need in such questions as: 
patient who has been in the hospital for two 
weeks and he has not yet had a visitor. He 
obviously feels uncomfortable and despond- 
ent during visiting hours. Is there anything 
I can do to make him feel better, or shall I 
ignore it?” “Patient X dies during the night 
and is removed from his room. 
lowing morning, 
ask about X’s whereabouts, 


“I have a 


uge such as say- 
o another ward?” 


el unprepared. Tf 
feelings expressed 


In a study by Shields 
ing, public health agenci 
and other nursing grou 


(6), schools of nurs- 
es, individual nurses, 
PS were asked to jn- 


1We wish to express our appreciation to Miss 
Marie L. Brophy, R.N., Chief Nurse, Mi: 
Jane A. McCarthy, R.N., Assi i 
Miss Ruby L. Roepe, R.N., Assistant Chief, Nursing 
Education, all of the Veter: 

pital, Denver, Colorado, without whose interest and 
cooperation this study cou 


pleted. 


dicate, by means of a questionnaire, oe 
they thought that a basic nursing eer 
should provide learning experiences inte One 
to develop certain qualities or abilities. rea 
such quality was described as: “. . ti man 
belief in the essential worth of every ee 
being and . . . the importance of commun! ; 
ing this belief by attitudes and ee lee 
p. 12). This quality appears to be a dimy 
translation of Rogers’ concept of reja 
which he defines as “, . . trying to u 
from the client’s point of view and to 452) 
municate that understanding” (5, P- sho re- 
Although a large percentage of nurses W ras an 
plied to the questionnaire felt that this ee o 
important ability, some of the commen g à 
the respondents reflect a doubt that ngai 
quality can be taught. The pe i s: 
ments are among those reported by S lity: 
“A person either has or hasn’t this Cant 
Shouldn’t be a nurse if she hasn’t it. 2 as 
be taught (supervisor of a visiting nurs an 
sociation) .” “This comes with miS ” 
cannot be taught (private duty ae a 
“Criminals too? (private duty oe 
“Tdealistic. Impossible. No one can > hu 
believe in the essential worth of ae 
man being (director of a school).” ea 
in family teaching, not nursing ie seem 
In view of such skepticism, it woul nether 
worthwhile to determine empirically W Rog 
or not nurse-patient relationships using sfully 
ers’ concept of reflection can be mn est 
taught. This study, then, proposes ills and 
the general hypothesis that nurses’ - cal 
attitudes in interpersonal relationship a thi 
be modified in a significant fashion Whe tech- 
nurses understand the nature of the tech- 
niques they use, the attitudes which such ings 
niques express or implement, and the fe 
they generate in patients. 


Method 


ped 
pretests, to be descrih en, 
ered to all staff “Hospi 
uty at the Denver VA 


A series of three 
low, was administ 
nurses on d 


324 


2 ii 


s 


An Application of Rogerian Concepts 


Each nurse drew a number which was used as 
identifying information for the tests, thus pro- 
viding personal anonymity. Upon completion of 
the pretests, it was announced that a hospital 
clinical psychologist was to conduct a course in 
nurse-patient relationships; that because of the 
Size of the group, he would be able to work 
with only half of the nurses at one time. In 
order to obviate any feeling that some pref- 
erence might be operating in the selection of 
ose to participate in the course, the selection 
Was made by the use of random numbers, in the 
Presence of the entire group, using the same 
humbers with which they identified their pretests. 
(the Procedure provided an experimental group 
à Ose selected at random to participate in the 
eae and a control group (those not se- 
i 9 ed).2 Table 1 indicates the degree of match- 
ng obtained by this randomization. 
à € course with the experimental group began 
Pproximately two weeks following the adminis- 
edion of the pretests. Ten weekly sessions of 
wih hours each were held. The course began 
nes a presentation of basic techniques which 
of he use in responding to patients, a discussion 
a e attitudes which these techniques express. 
to a discussion of how the patient might react 
each of these techniques. For the remainder 
ffo e course, nurses brought to class incidents 
Bae their own ward experience. These jinci- 
sity, S Were reported on a form which stated the 
cone and, as nearly verbatim as possible, the 
€rsation that took place between the nurse 
Davin’ patient. Behavioral responses accom- 
Ing the conversation were also reported. 
why qincidents were discussed in terms of: (a) 
may, id the patient behave in such and such a 
y en what feelings was he really expressing 
Hoyt verbalization and/or behavior; and, (b) 
ne Could the nurse best respond in such a situa- 
and t Many of the incidents were role-played, 
by t e implications of the situation discussed 
the -© group. t 
tonda along the lines suggested by Rogers 
struta tive concepts (4, 5). That is, the in- 
Partic tied to create an atmosphere in which 
expres Mts in the eperiment could feel free to 
discy Ss all shades of opinion and criticism in the 
‘sion of nurse-patient situations. 
eet Completion of this training, both groups 
ini te ental and control) retook the tests ad- 
Stered before the course was given. 


An effort was made to conduct 


M 
Teasurement Techniques and Hypotheses 


1, . 
test ihe Nurse-Patient Situation Test. This 


~~ made up of 35 nurse-patient incidents, 


ang te tests were originally administered to 77 staff 
of tha d nurses. At the time of the posttesting, 59 
Perime kanal group were available—30 in the ex- 
ee group, and 29 in the control group. 
pl avi Subjects who took the pretests were either 
tests, © or haq resigned at the time of the post- 


325 


Table 1 


The Degree of Matching Between the Experimental 
and Control Groups Achieved by 
Random Selection 


Experi- 
mental Control 
Item (N=30) (N=29) 
Median age 33.7 32.7 


Mean no. years nursing experience 14.0 12. 
No. of graduates of hospital schools 28 26 
No. of graduates of collegiate schools 2 3 
No. who have received degrees since 


graduation from hospital school 7 7 
No. of head nurses 35 6 
No. of staff nurses 25 23 
No. of medical nurses 10 10 
No. of surgical nurses 9 11 
No. of neurological nurses 5 4 
No. of psychiatric nurses 3 2 
No. of operating room nurses 2 1 
No. of central supply room nurses 1 1 


modified and adapted from Porter (3), together 
with five possible nurse responses to the state- 
ment of the patient. Each of the choices pur- 
ports to measure one of the following five cate- 
gories of response: E (Evaluative), H (Hostile), 
S (Supportive), P (Probing), and U (Under- 
standing). 

A sample situation from the test, with the re- 
sponse choices, is the following: 

I tell you I hate that doctor of mine. I hate 
him! I hate him! I ask him about my diag- 
nosis and he gives me the brush-off. Tells me 
a diagnosis hasn’t been made yet. Phooey! It 
makes me feel so terrible that I hate him so— 
especially when I have to count on him to get 
well. I—it worries me. 


E (a) This is something you'll certainly want 
to get straightened out. A good rela- 
tionship with your doctor is important 
for your recovery. You'll find he'll 
treat you better if you can make your- 
self have faith in him. 

H (b) You're certainly not acting very grown- 
up. These doctors know their business. 
You do an awful lot of complaining 
about something that yowre getting 
free. 

S (c) I guess most patients go through a pe- 
riod when they don’t like their doc- 
tors. It’s really not at all uncommon. 
I hear that from most patients. But 
things eventually settle down. 


* Porter (3) used the Interpretive category in place 
of our Hostile category. In an independent study of 
nurses’ responses, we found the Hostile category 
used more frequently than the Interpretive, and that 
the few Interpretive statements made by nurses 
could be subsumed under the Hostile classification 


326 


P (d) I think we ought to get at the root of 
that worry. Is there anything else 
your doctor has done to upset you be- 
sides not telling you your diagnosis? 

U (e) You're concerned about how sick you 
really are, and it worries you not to 
know for sure what your doctor thinks. 


The above example will also serve to illustrate 
the definitions of the five categories. In the 
Evaluative response, the nurse has made a judg- 
ment of relative goodness of the patient’s feel- 
ings, and has implied how the patient ought to 
feel and what he might do. It would follow that 
the patient might not feel free to further ex- 
plore his feelings about his physician since the 
nurse has, in effect, indicated his feelings are in- 
appropriate. 

The Hostile response in the above illustration 
again indicates to the patient the inappropriate- 
ness of his feelings and, in addition, subjects him 
to ridicule by implying that he is immature, and 
that he must accept whatever treatment is given 
him since the service is free, 

Through the reassurance given the patient in 
the Supportive Tesponse, the nurse, in effect, de- 
nies that the patient has a problem, that he need 
not feel as he does. 
feelings may preclude further discussion (leav- 
ing the nurse with the feeling that her reassur- 
ance has “worked”), i 
the patient’s feelings. 

The Probing response implies that the patient 
might profitably discuss the point further, that if 
the patient will only give her more information, 
the nurse will be able to Provide the answer or 
solution to his problem, 

By means of the 


the patient’s point of 
that understanding to the patient. 
feeling “safe” 


whatever attitudes he has are perm: 
now feel free 


5 test of 47 nursing 
students at the University of Colorado School of 


Nursing indicates that the five categories of re- 
sponse are relatively independent, with very little 
overlap. Intercorrelations between each cate- 
gory of response with every other category 
yielded low negative correlations with the excep- 
tion of two non-significant low positive correla- 
tions. Furthermore, the test appears to be suffi- 
ciently reliable for use. Split-half reliabilities, 
correlating odd with even items, based upon the 
data of the 47 nursing students, are as follows 
for each category: Evaluative, .77; Hostile, .80; 
Supportive, .74; Probing, .88; and Understand- 
ing, .92. 

patei 1: That the differences between 


Lewis Bernstein 


the posttest and pretest scores for the epeal 
mental group will show significantly greater d 
creases in all categories of response (except ntly 
derstanding, which will show a significa 
greater increase), than for the control aren on 
It is obvious that this Nurse-Patient one the 
Test is a comparatively direct measure 0 ore 
content of the course, but may not reflect a es 
basic change in underlying attitudes of the Pieni 
It was felt, therefore, that other indepen ia 
measures of attitude change should be inc 
in the test battery, e 
2. The F-Scale As one independent meae 
of more basic changes in attitudes, the Ie was 
was included in our test battery. This sca mon 
developed in an extensive study of the «a etai 
tarian personality,” and is described ad s on 
elsewhere (1). This scale measures attitu pith 
a continuum ranging from authoritarian to 
cratic. seen the 
Hypothesis 2: That the difference betwee et 
pretest and posttest scores for the exper et to- 
group will show a significantly greater R for 
Tem the democratic end of the scale tha 
the control group. a 
3. The Uani Test. In this procedum si 
lengthy case history, constructed from pa 
notes on an actual patient, is read to the pi as 
The items in the history can be classifies g- 
physical items (temperature, blood pressurey ig, , 
noses, laboratory procedures, medications, vior, 
and psychological items (patient’s ward be paa 
the degree of his dependency, his employ iy j 
history, etc.), Immediately after the Bista 
read, the subjects are asked to write this 
everything they remember. The score On 
test fe: = umber of physical items yz 100. 
Number of psychological items is 
A high ratio would indicate reel of more Pr 
cal items than psychological. The rationa yi 
using this procedure is that the case hista a 
too long for the subjects to remember Privel 
thing; that what is remembered will be selec 


A ips 
and that the course in nurse-patient relations 
will make the experi 


i 

mental group more sensi y, 

to psychological factors in the patient’s histo p 

Hypothesis 3: That the difference betweet an 

Pretest and posttest ratios between physic@ aptly 

Psychological items will show a significa or 
greater drop fi 


n 
or the experimental group tha 
the contro] group. "R k 


Results st 
Table 2 presents the pre- and ee 
Scores for both groups, and the confide 3 
levels of the pre-post differences betwee” 
perimental and control groups. 


Ji 
The following facts are evident in Table 


sig 
„l. The experimental group showed 2 e- 
nificantly greater decrease in evaluativ 
sponses than the control group. 


he 


—_—_—_—_—__—_—) 


i 


An Application of Rogerian Concepts 


Table 2 


Differences Between Pretest and Posttest Scores 


Experimental Control 
Test Pre Post Pre Post P 
Nurse-Patient Situation Test: 
Evaluative responses 10.0 1.0 12.5 11.0 .001 
Hostile responses 1.0 5 i7 1.4 .90 
Supportive responses 10.2 3.0 10.3 10.0 .001 
Probing responses 9.7 2.2 8.2 9.5 001 
Understanding responses 4.5 28.3 2.3 3.6 .001 
F-Scale 99.7 91.0 113.0 116.1 01 
Memory Test 173.4 144.1 173.0 193.0 05 
Discussion 


T Neither group showed a significant de- 
š ase in hostile responses. The exceedingly 
mall number of hostile responses by both 
Broups (out of a possible total of 35) mini- 
mizes the importance of this category. 
nie - The experimental group showed a sig- 
% cantly greater decrease in supportive re- 
Ponses than the control group. 
ae - The experimental group showed a sig- 
a decrease in probing responses. The 
ee group showed a slight increase in 
not eter sponses, although this increase is 
Significant. 
The experimental group showed a sig- 
te antly greater increase in understanding 
Ponses than the control group. 
ere data support our first hypothesis: 
Ed experimental group would show a 
beige greater decrease in evaluative, 
Btohin (not significant), supportive, and 
increas responses, and a significantly greater 
con Se in understanding responses than the 
ntrol group. 
Sia experimental group showed a sig- 
Crati nt shift in attitudes toward the demo- 
ne end of the F-Scale. The control group 
tory ed a slight, but not significant, shift 
ard the authoritarian end of the scale. 


Thes 
e data ree : h " 
Pothesig > support the prediction in hy 


nita te experimental group showed a sig- 
Psych eae ratio between physical and 

e ological items on the Memory Test. 
but “outtol group showed a slightly higher, 
not significant, ratio, These data sup- 


Por 
the prediction in hypothesis 3. 


As previously stated, the attitudes and 
skills which we hoped to convey to the experi- 
mental group are based upon the nondirec- 
tive concepts of Rogers (4,5). It was, there- 
fore, interesting to note that the group went 
through a process similar to that in a thera- 
peutic counseling situation. Early in the 
course, many negative attitudes were freely 
expressed. As these were accepted by the 
instructor, more positive attitudes began to 
appear. The class itself noticed the con- 
spicuous change in the “climate” of the 
course. 

The question may arise if any more was 
accomplished than to train these nurses to 
recognize an understanding response. But 
the accompanying changes in sensitivity to 
psychological and social factors in a pa- 
tient’s case history, and the less authoritarian 
scores achieved on the F-Scale, do suggest 
that more basic changes took place. At a 
later date we plan to test the relative perma- 
nence of these changes. Several of the par- 
ticipants in the course were asked to explain 
their lowered scores on the authoritarian 
scale. The consistent response was that dur- 
ing the course they learned to respect the 
feelings of others, that patients could par- 
ticipate in the solution of their own prob- 
lems, and that these attitudes could carry 
over to other spheres. 

In addition to the changes in test findings 
there is other evidence that more than con- 
tent was learned in the course. Nurse super- 
visors have reported that most nurses in the 


328 


experimental group are using these skills. 
Several of the group have requested addi- 
tional training along the lines offered in the 
course. Even more convincing were the dif- 
ferences in understanding of patients’ feel- 
ings noted between the incidents turned in 
for discussion early in the course and those 
submitted toward the end of the course. 

This study has demonstrated that nurse- 
patient relationships making use of Rogerian 
concepts can be successfully taught. It is not 
to be inferred that a course such as that de- 
scribed in this paper is all that is necessary. 
Ideally, such a course should be taught early 
in nurses’ professional education, and fol- 
lowed by appropriate ward supervision. The 
nurses represented in this study come from 
59 nursing schools in 22 states. Yet, our pre- 
test results indicate that very few had any 
meaningful preparation in this area. The 
study by Phillips and Agnew (2) indicates 
that the technique of giving understanding 
responses is “. . . considerably more than a 
simple extension of knowledge of interper- 
sonal relations possessed by any reasonably 
intelligent and emotionally mature person.” 
In other words, such skills and attitudes can- 
not be assumed to result from general nurs- 
ing experience; they must be taught. And, 
with the current emphasis on interpersonal 
relations in nursing, the method herein de- 
scribed appears to be one manner in which 
such teaching may be accomplished. 


Summary 


Two groups of nurses—30 in an experimen- 
tal group, and 29 in a control group—took a 
battery of three pretests. The Nurse-Patient 
Situation Test measured five categories of 
nurses’ responses to patients’ statements: 
Evaluative, Hostile, Supportive, Probing, and 
Understanding. The F-Scale measured social 
attitudes on a continuum ranging from au- 
thoritarian to democratic. The Memory Test 
measured the ratio of physical items to psy- 
chological items remembered from a lengthy 
case history. 

Following the administration of the pre- 
tests, the experimental group participated in 
a course in nurse-patient relationships. An 
effort was made to conduct the course along 
the lines suggested by Rogers’ nondirective 


Lewis Bernstein 


concepts. That is, the instructor tried to 
create an atmosphere in which participants 
could feel free to express all shades of opin- 
ion and criticism in the discussion of nurse- 
patient situations. f 
Upon completion of the course, both the ex- 
perimental and control groups again took the 
series of tests described above, and the dif- 
ferences between the pre- and posttest scores 
were compared. On the Nurse-Patient Situa- 
tion Test, the experimental group showed i 
significantly greater decrease in Evaluative, 
Supportive, and Probing responses, with 4 
correspondingly greater increase in Under- 
standing responses than the control group: 
No significant decrease in Hostile responses 
was demonstrated by either group. How- 
ever, the exceedingly small number of Hos 
tile responses minimizes the importance © 
this category, r 
The experimental group showed a signifi- 
cant shift toward the democratic end of ia 
F-Scale, while the control group showed 7 
significant change. r 
The ratio of physical to psychological iem» 
on the Memory Test showed a significant de 
crease for the experimental group, while the 
control group showed no significant change 
It is concluded that nurses’ skills and ar 
titudes in interpersonal relationships can 
modified in a significant fashion when nurses 
understand the nature of the techniques they 
use, and the attitudes which such techniques 
express or implement, and the feelings they 
Senerate in patients. 


Received October 26, 1953. 


References 


PE 

1. Adorno, T, w,, Frenkel-Brunswik, Else, Dene 
D. J., and Sanford, R, N. The authorita 
personality. New York: Harper, 1950. 

* Phillips, E. L. and Agnew, J. W. Jr. A ae 
Rogers’ ‘reflection’ hypothesis. J. clit. 
chol., 1953, 9, 281-284, thero- 

3. Porter, E, H., Jr. An introduction Boughton” 


peutic counseling. New York: 
Mifflin, 1950 


4. Rogers, C. R, Counseling and psychothera?? 
New York: Houghton-Mifflin, 1942. 

5. Rogers, C, Client-centered therapy- 
York: Houghton-Miffin, 1951. i 

6. Shields, Mary R, A Project for curriculum 
provement. Nurs, Res., 1952, 1, 4-31- 


yof 


New 


) 


| 
| 


=y"" 


“a 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 5, 1934 ° 


Instructor-Centered and Student-Centered Approaches in 
Teaching a Human Relations Course 


Francis J. Di Vesta 


Syracuse 


The present study is a report on an ex- 
Periment to evaluate: (a) the achievement 
of students in terms of outcomes desired from 
a human relations course; and (b) the rela- 
tive effectiveness of two methods of teaching 
n achieving these outcomes. The course is 
One segment of a curriculum for the training 
of medical administrative supervisors in a 
military school. The experiment was con- 
ducted because of the lack of consistency in 
the findings from previous studies (1, 3, 11, 
12, 13, 14, 17, 18, 19, 21, 22, 23). It is 
ased in a general way on some of the pro- 
cedures used in a previous study conducted 
by Canter (6). The present study differs 
from Canter’s in that it compares two teach- 
ing methods, uses airmen rather than insur- 
ance company supervisors as subjects and in- 
Corporates a greater number of measures than 
Used by Canter. 


Questions Studied 


The present study was limited to a study 
Of the following questions: 
_ l. Is the twenty-hour block of instruction 
tt human relations sufficient to increase the 
achievement level of students? 
= If changes are made by the instruction, 
at is the extent and direction of this change 
i. the: (a) knowledge level; (b) attitudinal 
vel; and (c) skill level? 
Gee Is one of the two methods of instruc- 
n more effective than the other for pro- 


ens change in the student achievement 
evel? 


General Procedure 


ie study was designed to take advantage of 
Mini est experimental procedure possible with a 
ine mum disruption of course work and of rou- 
Coupee Perally employed in the conduct of the 
"se. The over-all design was simply the pre- 

* Th 
on aes study was conducted while the author was 
man pef of the Officer Education Division, Hu- 
For esources Research Institute, Maxwell Air 

ce Base, Alabama. 


University * 


test, instruction, post-test design and is shown 
below in more detail by steps. 

Step 1—Pre-Test: All students in both the 
control and experimental groups were given all 
tests. 

Step 2—Instruction: Students were divided 
into three groups for instruction purpose. Ex- 
perimental group one received instruction by the 
instructor-centered method. Experimental group 
two received instruction by the student-centered 
discussion method. The third group was the 
control group and received only the technical in- 
struction given in the course but did not receive 
the instruction given in human relations. 

Students were selected for one or the other of 
the teaching methods on the basis of sociometric 
leadership ratings in a group performance test 
(see below for description of the test). Accord- 
ingly, those students from the first group to be 
administered the group performance test who 
were rated 1, 3, 5 were assigned to the instructor- 
centered method and those students who were 
rated 2, 4 and 6 were assigned to the student- 
centered method. Those students in the second 
group who were rated 1, 3 and 5 were assigned 
to the student-centered method and those who 
were rated 2, 4 and 6 were assigned to the 
instructor-centered method. This procedure was 
repeated until assignments had been made for 
all individuals. Those individuals who were as- 
signed to the student-centered method were fur- 
ther sub-divided into sections of six people each 
for the actual instruction. These sub-groups 
were formed on the basis of a random sampling 
„design which would assure that none of the in- 
dividuals who were in the test groups worked to- 
gether during the course. This procedure was a 
caution which assured the experimenter that one 
method would not have an advantage over an- 
other method on the group performance test as 
a result of an informal structure which might de- 
velop over a period of time during the formal 
course work. 

Step 3—Post-test: All students in the experi- 
mental and control groups were given all tests. 
The testing situations and schedules were exactly 
the same for each individual on both the pre- and 
post-test. 

During the course of the experiment two ob- 
servers were placed with the class given instruc- 
tion by the student-centered discussion method 
and one observer was placed with the class given 
instruction by the instructor-centered method 
These observers checked on: (a) the extent to 


329 


330 Francis J. 


which instructional content material varied be- 
tween the two classes and (b) the extent to 
which the instructor’s approach was consistently 
oriented to the instructional method he was rep- 
resented as using. Students in each class were 
also required to describe the instructional pro- 
cedure through the use of a check list. 


The Criteria 


The measures for criterion Purposes were se- 
lected on the basis of the following requirements: 


1. The test should measure some aspect of hu- 
man relations ability or of leadership ability as 
established in previous studies. 

2. The test should measure some aspect of 
school objectives. 

3. The test should be dependable in its meas- 
urement properties. 

4. The test battery should represent measures 
of human relations or leadership ability at the 
knowledge, attitude and behavioral levels (see 
below for fuller descriptions of these levels). 

5. The tests should represent measures of ob- 
jectives desired in the course. 

The tests, classified according to level of meas- 
urement, are listed and described briefly below. 


Knowledge tests. What facts about human re- 
lations and leadership does the student know? 


1. Personnel relations test (20). Developed 
by the Air Force’s Human Resources Research 
Center for use with personnel in administrative 
positions. Measures what the student knows 
about supervisor-subordinate relationships. 

2. “How Supervise?” test (7). 

Attitude tests. How does he feel about certain 
kinds of leadership orientation? 


_ 1. Problems of the Non-commissioned O cer 
in Charge (4). A set of five scales Soar 
and validated at Harvard University. Measures 
orientation toward discipline (severe-not severe) ; 
assessment of promotion practices (perceives 
rouch wrong-perceives little wrong); and han- 
dling of informal pressures in organization (try 
to satisfy these pressures-ignore pressures), 

2. Leadership Opinion Questionnaire (8). Con- 
tains two scales. One measures orientation to- 
ward initiating structure in working with sub- 
ordinates and aggressive directing of subordi- 
nates toward achieving the goal. The second 
scale measures the extent to which the super- 
visor is considerate of the feelings of those un- 
der him. Developed at Ohio State University. 
Skill tests. The skill tests were used in an at- 
tempt to measure how the individual behaves in 
a realistic situation. These are divided into Jn- 
direct measures and Direct measures of behavior. 

The indirect measures are paper and pencil 


A Only the non-commercial tests are described. De- 
tailed descriptions of the commercially available tests 
may be found in the references provided (5, 9). 


Di Vesta 


tests which establish situations for the respond- 
ent and require him to react to these situations. 
The advantage of this type of measurement is 
that it is possible to present a variety of situa- 
tions to the respondent. pennae" 

The direct measures are actual situations 
wherein the individual reacts to a realistic prob- 
lem in conjunction with other individuals. 

Indirect Measures: 

1. Social intelligence test (16). 

2. Prediction of human reactions test (15). 
Developed for the Detroit Edison Company. It 
was revised for this experiment to be adaptable 
to the airman population. Primarily oriented to- 
ward judging how an individual would react 
given certain characteristics of individuals an 


circumstances which might occur in a supervisor- 
subordinate relationship. 


Direct Measures: 


Students were assigned to a group composed 
of six airmen. They were told that they were tO 
constitute a board to act on a morale problem 
Occurring in a hospital staff, While acting as ê 
board they were observed and rated by tech- 
nicians trained for this kind of observation. 
Scoring was accomplished by using Bales’s | 
interaction scoring form and by rating individu- 
als according to the four roles of leadership ac- 
tivity, ability, likability and contribution of bes 
ideas. Ratings on the four roles were also ma i 
by examinees at the close of the session as We 
as by observers. Sixteen groups in all were use 


Instructors and Instructional Method 


The instructors were select because of their 
ability to use one of the ee Each felt his 
competence was greatest in the method he was 
to use and was selected by his colleagues am 
Superiors as being the most competent in tha 
method of instruction. (Had it been possible, 
the experiment would have been replicated te 


carefully briefed on the 


r Ea varned not 
emphasize content beyond gh gels in the 
lesson plans. Observers were used to assure ie 
vided uetors remained within the content p!° 
Decisions were made between both instructors 
and the experimenter as to how the instruction 
methods Were to differ. Both observers and stu: 
fenis rated the instructional methods on a ©” 
ist of descriptive items, Items which, by iter 
to da using the chi-square statistic, were a 
inai igni m 
the two methods ef nantly T iai 


Lisa instruction are summatl® 
qualitatively below. These items serve tO de 


scribe the ‘conduct of the i thod * 
A t me 
perceived by the students, E 


i —— a. 
a ee 
= a eee 


Approaches in Teaching a Human Relations Course 331 


Table 1 


A Comparison Between Pre-Post-Test Scores for Control and Combined Experimental Groups 


re 


Control Experimental 
N=24 hie OL 
Criterion Measure Mean S.D. Mean S.D. 
Personnel Relations Test Pre 24.1 4.8 21.2 5.6 
Post 25.3 46 25.0 52 
t 1.84 9.43* 
How Supervise? Test 
Total Pre 62.5 7:3 58.5 10.7 
Post 61.9 8.5 64.4 1 
t 62 6.61** 
Supervisory Practice Pre 12.3 3.2 28 
Post 1.7 3.2 2.3 3.3 
t 1.11 1.41 
Company Policies Pre 24.4 3.9 23.5 4.9 
Post 25.4 3.5 24.2 47 
t 1.45 1.50 
Supervisory Opinion Pre 25.8 3.9 23.2 5.8 
Post 24.8 4.6 27.9 6.1 
t 1.70 8.69** 
NCOIC 
Promotion-Orientation Pre 24 1,2 2.8 1.0 
Post 2 9 2.9 1.3 
t 65 42 
Assessment of Rewards Pre 3.1 AS) 3.5 13 
Post 24 1.2 3i 1.2 
t 3.74** 3.08** 
Informal Pressures Pre 3.0 14 2.6 1.1 
Post 2.7 12 2.8 11 
t 94 1.53 
Discipline-Justice Pre 2.8 1.2 24 9 
Post 2.6 1.2 21 9 
t 67 z 12.58" 
Discipline-Initiative Pre 3.4 11 3.5 1.1 
Post 3.2 1.4 2.9 1.3 
t 84 4,21** 
Leadership Opinion Questionnaire Pre 520 77 a os 
Initiating Structure na 48.5 — t 438% 
Consideration Pre 58.0 7.0 56.3 8.0 
Post 56.5 7.4 60.0 6.8 
t 1.52 5.83** 
Social Intelli 
igence 
Judgment in Social Situations Pre 20.7 3.2 19.6 3.8 
Post 21.2 35 20.0 2.3 
t 1.19 2.32* 
Observation of Human Pre 37.0 7.8 33.0 11.7 
Behavior Post 40.7 8.1 36.6 10.2 
t 2.79* 6.14" 
Prediction of i Pre 48.6 10.3 47.7 
of Human Reactions Si F3 ti 50.0 a? 
t 30 2.97** 
H = P .05 or <.05. 
= p .01 or <.01. 


332 Francis J. 


Method A 
Instructor-Centered 


Suggestions were evaluated by instructor who 
advised or led class to correct conclusion, 

Techniques and steps for activities were given 
by the instructor. 

Instructor (rather than student) considers and 
handles individual problems and questions. 

The instructor is the focus of attention. Stu- 
dent to student attention happens rarely or oc- 
casionally. 


Method B 
Student-Centered Discussion 


Instructor encouraged suggestions and used this 
procedure to stimulate class to carry out class 
activities themselves. 


Techniques and steps for activities emerged 
from the group discussion, 

Group consideration of individual problems is 
encouraged by the instructor, 

The instructor is the focus of attention when- 
ever the discussion or activity needs guidance or 
information; otherwise students directed their 
attention to one another. 


Results 


Only the results obtained from a study of 
the written tests and of the sociometric data 
are reported here. 

The first hypothesis tested was that the 
course, regardless of method of instruction, 
produced an improvement in student achieve- 
ment level. A comparison of pre and post 
scores for the control group with pre and 
Post scores for the group receiving instruction 
was made for each of the tests. This com- 
parison is shown in Table 1. 

Significant (p < .01) changes were made by 
the course segment in the knowledges related 
to human relations and leadership skills. 
These changes are reflected in the Personnel 
Relations and total How Supervise? test 
scores. The changes in the How Supervise? 
sections on company policies and supervisory 
practices were not significant although these 
tests also measured knowledge. An inspec- 
tion of these tests indicates, however, that the 
content area is more appropriate to an in- 
dustrial situation and would not be covered 
in a military course. 

Important changes were also made in stu- 
dent attitudes. The most significant change 
in the attitude area is reflected in the pre 
and post test scores of students on the How 


Di Vesta 


Supervise? section on supervisory opinion and 
on the Leadership Opinion Questionnaire sec- 
tion on consideration for others. Another 
change is noted in the NCOIC Problems. 
Student responses indicated a more lenient 
attitude toward problems of discipline and 
Promotion after having attended the course 
than they did prior to attendance. It is an 
teresting to note that all groups (including 
the control group) made significant changes 
on the Leadership Opinion Questionnaire sec- 
tion on initiating structure. This change was 
in the direction of a less favorable attitude 
toward active directing and structuring of 
situations in which leadership might be dem- 
onstrated. Undoubtedly some of the change 
occurred as a result of practice effect; how- 
ever, the implication might be made that this 
change has occurred as a result of being ™ 
the school setting. An interesting tiypothesi 
is stimulated here that the informal schoo 
setting, wherever it may be, may have detr- 
mental effect on attitudes toward active ini- 
tiation of structure. It is doubtful that such 
a change is more than a temporary one, al- 
though this hypothesis, too, should be a sub- 
Ject for further investigation. 

In the area of indirect measurement of hu- 
man relations skills the course, in general, 
effected significant changes. Students taking 
the course made significant gains on the Soci 
Intelligence test section on judgment in socla 
situations, and the Prediction of Human Re- 
actions test. All groups (including the com” 
trol group) made significant gains on the 
Social Intelligence test section on observa 
tion of human behavior. 

The pre-post Correlations for control and 
experimental groups on each of the measures 
are shown in Table 2, Although the reliability 
of the measures used here was availa 
from previous studies using these instruments; 
it was desired to obtain some indication ° 
reliability when used with our subjects. The 
pre-post correlations for the control group Te 
flect a measure of the test-retest reliability: 
These correlations are shown in Table 
with similar correlations for the experimenta 
group. Five of the measures had prepo” 
correlations of .75 to -81; five measures, eor 
relations of .62 to 68; three measures, © 


| 


| 


a, M 


Approaches in Teaching a Human Relations Course 333 


Table 2 


Pre-Post Correlations for Control and Experimental 
Groups on Each of the Tests 


Group 
Experi- 
Control* mental 
Test N=24 N=94 
Personnel Relations Test A5 74 
How Supervise? Test 
Total 79 69 
Supervisory Practice 68 42 
Company Policies 62 55 
Supervisory Opinion 76 62 
NCOIC Problems 
Tomotion-Orientation 34 H 
Assessment of Rewards 61 45 
Informal Pressures .29 .21 
Discipline-Justice 49 AM 
Discipline-Initiative St 35 
“Cadership Opinion Questionnaire 
Initiating Structure 67 61 
Consideration 77 66 
ocal Intelligence 
Judgment in Social Situations S81 -90 
bservation of Human Behavior -66 87 
rediction of Human Reactions 53 72 


k eyy 
reli The control correlations amount to a measure of 
e lability, 


"lations of .49 to .53; and two measures had 
e-post correlations of .29 and .34. 
w. anter’s (6) research, in some respects, 
Keg Similar to the present one. His study 
ie, Conducted with supervisors of three large 
Surance companies. A control group was 
wed but only the lecture discussion method 
ites used in his study. The course was the 
tas length (20 hours) as the one used in 
© present study and the content was similar. 
on comparison of the results of both studies, 
in Pi appearing in both studies, is shown 
able 3.2 
tage insurance company supervisors 
teved higher average scores on both the 


2 
have? reduce printing costs Tables 3, 4, 5, and 6 
tion pon deposited with the American Documenta- 
Aux stitute. Order Document 4323 from the ADI 
Service Publications Project, Photoduplication 
remitti Library of Congress, Washington 25, D. (eS 
$1.25 p8 in advance $1.25 for 35 mm. microfilm or 
able t°% 6X 8 in. photocopies. Make checks pay- 


Congres ief, Photoduplication Service, Library of 


pre-test and post-tests than did the airman 
population, on both tests. However, the air- 
man population made a significant (p < .01 > 
.001) change in scores, as a result of the 
course, on the Prediction of Human Reac- 
tions test whereas a significant change was 
not reported for the insurance company su- 
pervisors. Similarly, the airman population 
made gains: (a) as great as the insurance 
company supervisors on the “How Super- 
vise?” total score; and (b) greater than the 
insurance company supervisors on the “How 
Supervise?” supervisory opinions score. On 
the other hand, Canter reports a significant 
change on the “How Supervise?” company 
policy score for the insurance company su- 
pervisors while the change for the airman 
population was not significant on this par- 
ticular test. 

Knowledges, Attitudes and Indirect Meas- 
ures of Skills. As was noted earlier in this 
report, one of the weaknesses of this part 
of the study was that it was impossible to 
reverse the roles of the two instructors. 
However, it is assumed that each of the in- 
structors was the best that could have been 
obtained for using the particular method. 
Students in each group rated their respec- 
tive instructors the same way with regard to 
interest of the instructor in the subject. 
They described their respective instructor 
as “being interested in the academic prog- 
ress of the students and interested in the 
students as individuals.” The gains for the 
experimental groups, contrasted with the 
gains for the control group, on each of the 
tests, are shown in Table 4. The F test was 
used for testing significance of differences in 
gains for all groups. Where a significant 
F was found, the t test was applied between 
groups. Significant differences were found by 
the t test to occur only between the experi- 
mental and control groups. 

In general, the evidence does not point to 
either method of instruction as being superior 
to the other. There was a general tendency, 
however, for the students taught by the lec- 
ture method to make greater gains on the 
knowledge and attitude tests than those stu- 
dents taught by the discussion method. 
These differences in gains made by the ex- 


334 Francis J. 


perimental groups were not significant sta- 
tistically. 

Leadership Skills. The measurement of 
leadership skills was conducted by placing the 
examinees in a simulated board meeting. 
The purpose of this meeting was to act on a 
morale problem which occurred in a hospital. 
During the meeting the examinees were rated 
by observers using Bales’s Interaction Process 
analysis. Students were provided twenty 
minutes to act on the problem and five addi- 
tional minutes for summarizing the discus- 
sion. After the meeting was over the re- 
spondents ranked one another from 1 through 
6 on leadership, guidance, and best ideas, 
The same procedure was followed on the 
post-test. (See the section on procedures for 
a further description of how individuals were 
assigned to sections.) 

The intercorrelations between the socio- 
metric rankings are shown in Table 5i 

The change in rankings are shown in 
Table 6. This table is based on the average 
score of the individual. Those individuals 
with an average leadership score of 0 to 1.99 
were placed in category I (most leadership 
ability). Those with an average score of 

2.00 to 3.99 were placed in category II and 
4.00 to 6.00 were placed in category III (low 
leadership), 

There was a significant change (p > .02 < 
05) via the chi-square test of significance in 
the pre-post lecture group on leadership, 
This difference is attributed largely to the in- 
crease in category I individuals and the de- 
crease in category II individuals. Fiye peo- 
ple were rated as I (high leadership) before 
instruction and 13 after instruction, 

A comparison of these two tables, however, 
shows some other trends which, although not 
significant, are worthy of consideration. For 
the discussion group individuals originally 
assigned to the middle category (N = 27), 
there was very little movement out of this 
category. There was, however, a considera- 
ble movement of the discussion people origi- 
nally assigned to category III. Approxi- 
mately 50% of these individuals improved 
their leadership scores on the post-test. This 


movement does not occur for students in the 
lecture group. 


Di Vesta 


Summary 


1. A 20-hour block of instruction in hu- 
man relations, as taught in a course for air- 
men made a significant change in student 
performance. Students, taken as a body, 
showed significant gains in achievement le 
measured by the pre-test results compare 
with the post-test results) on the following 
tests: (a) Personnel Relations Test; (b) 
How Supervise? (Total Score); (c) How 
Supervise? (Supervisory Opinions); (d) 
Leadership Opinons (Consideration Score); 
and (e) Social Intelligence (Judgment ™ 
Social Situations). Hof 

2; Furthermore, as measured by certain 0 
these same tests the course was as a 
as a similar course given to the a aloe 
of three large insurance companies. Studen a 
in the Medical Administrative Superyora 
course made gains as great as the insurance 
company supervisors on the “How Super 
vise?” test. f 

3. The use of the discussion method r 
teaching appeared to have a slight advantage 
over the instructor-centered approach in E 
proving leadership ability. There was tie 
strong tendency for students starting 
course at a low leadership level to improv" 
through the discussion method. This ten 
the ied not exist for individuals taught DY 
the instructor-centered approach. Studen® 
at the upper levels of leadership ability A 
not affected much by either method. T i 
finding was necessarily based on a a 
number of people. Tt should be made aeni 
that a tendency, not a clear-cut change W? 
found. A replication of the experiment wou 
Provide more definitive data, 


tit 
ad 
the i 
od 
be 


: . nificant 
5. There is a Pronounced and signife d 


j a 
change in student attitude in general, tow 


Approaches in Teaching a Human Relations Course 


initiating structure. Students, after being in 
School, tend to feel that initiating structure 
iN group situations is less important than 
they did before the course started. This 
change appears to occur by virtue of being in 
the school situation and is not directly at- 
tributable to a particular teaching method. 
he evidence that this occurs as a result of 
eng in a school situation is that this change 
occurred for each group including the con- 
trol Stoup. Further research would be neces- 
Sary if it is desirable to know whether this is 
a temporary change or a permanent one. It 
's doubtful that the change is permanent. 
urther research would also be required to 
Yield answers about how to develop a school 
atmosphere that would promote positive at- 
titude toward initiating structure. 


Received October 29, 1953. 


References 


' Asch, M. F. Nondirective teaching in psychol- 
OY: an experimental study. Psychol. Monogr., 
1951, 65, No, 4 (Whole No. 321). 

Bales, R. F. Interaction scoring form. Cam- 
bridge, Mass.: Addison-Wesley Press, Inc., 

49, 

Bane, C, L, The lecture versus the class dis- 
cussion method of college teaching. Sch. & 

4, p S0C» 1928, 21, 300-302. 

Orgatta, E, Questionnaire on problems of the 
non-commissioned officer. Cambridge, Mass.: 

8 Harvard University, 1952. 

“uros, O, K. The fourth mental measurements 

6. œ ebook. New Jersey: Gryphon Press, 1953. 

anter, R. R. A human relations training pro- 

7 py tm J. appl. Psychol, 1951, 35, 38-45. 

< File, Q. W, and Remmers, H. H. (Ed.) “How 
Supervise?” Form A. New York: The Psy- 

8. F] chological Corporation, 1948. : rs 

Teishman, E. A. Foreman’s leadership opinion 

Questionnaire. Columbus, Ohio: Ohio State 

University, 1952. 


9. 


10. 


ti 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


335 


Fleishman, E. A. “Leadership climate” and su- 
pervisory behavior. Ohio: Personnel Research 
Board, The Ohio State University, 1951. 

Guetzkow, H. Groups, leadership and men. 
Pittsburgh: Carnegie Press, 1951. 

Husband, R. A statistical comparison of the 
efficacy of large lecture versus smaller recita- 
tion sections upon the achievement in general 
psychology. Amer. Psychologist, 1949, 4, 216. 
(Abstract.) 

Jones, H. E. Experimental studies of college 
teaching. Arch. Psychol., 1923, 10, No. 68. 
Katzell, R. A. Testing a training program in 
human relations. Personnel, 1946, 23, 85-97. 
Krech, D. and Crutchfield, R. S. Theory and 
problems of social psychology. New York: 

McGraw Hill Book Co., Inc., 1948. 

Meyers, H. H. Human relations—A test of 
ability to predict human reactions. New 
York: H. H. Meyers, 1952 (Revised). 

Moss, F. A., Hunt, T., and Omwake, K. T. So- 
cial intelligence test, SP edition. Washington, 
D. C.: George Washington University, 1947. 

Roseborough, Mary E. Experimental studies of 
small groups. Psychol. Bull, 1953, 50, 275- 
303. 

Sanford, F. H. and Hemphill, J. F. An evalua- 
tion of the text “Psychology for Naval Lead- 
ers used in leadership training at the naval 
academy.” College Park: University of Mary- 
land, Department of Psychology, 1948. 

Spence, R. B. Lecture and class discussion in 
teaching educational psychology. J. educ. Psy- 
chol., 1928, 19, 454-462. 

Zacarria, M. A. Personnel relations test. San 
Antonio, Texas: Human Resources Research 
Center, 1952. 

Development of evaluative and predictive meas- 
ures in the Air Force Officer Candidate 
School. Washington, D. C.: Human Resources 
Research Commission Research Bulletin 47-1, 
1947, 

Research on military leadership. Washington, 
D. C.: Panel on Human Relations and Morale, 
Committee on Human Resources, Research 
and Development Board, 1951. 


Tue JOURNAL oF Al 


PLIED PSYCHOLOGY 
Vol. 38, No. 5, 1954 


195 


Vocational Interests and Socio-Economic Status 


John W. Gustad 
University of Maryland 


Prominent among the factors which are 
thought to influence the choice of an occu- 
pation is socio-economic status. This may 
include both the status accorded to the oc- 
cupation by others as well as the level of 
aspiration of the individual concerned. It is 
this latter, the level of aspiration of the in- 
dividual with respect to occupations, that is 
the concern of the present investigation. 

Early in his work, Strong (9) recognized 
the need for a measure of the status aspira- 
tions of individuals completing his interest 
blank. He accordingly developed a scale 
which he called Occupational Level (hence- 
forth referred to as OL). This was accom- 
plished by contrasting the item responses of 
laboring men with those of men in business or 
the professions earning over $2500 per year. 
It should be noted that at the time this scale 
was built, this figure represented the upper 
fifth of the income distribution in this country. 

Since its publication, a considerable amount 
of research has been done on and with OL. 
It has seemed promising as a measure of 


d 
motivation, level of aspiration, or socio-eco- 
nomic status drive. 


Darley (2, p. 60) has 
called it“... a quantitative statement of the 
eventual adult level of aspiration.” Darley 
(2), Gustad (5), Kendall (6), Ostrom (7,8), 
and Strong (9) have shown that OL has some 


relationship to success or staying power in 
college. 


In an extensive stud 
of drive, Barnett et al. 
interesting relationships, 
correlate .44 with a self- 
aspiration in one school, 
a verbal level of aspiration measure, .26 in 
the first school, .18 in the second. Though 
the results were not entirely clear, it was con- 
cluded that there was some relationship be- 
tween OL and other measures of level of 
aspiration. Stewart, in the same monograph, 
concluded that “. . . the mother may have a 
greater influence on the development of voca- 


y of OL as a measure 
(1) reported several 
OL was found to 
rating of level of 
04 in another; with 


336 


tional interests than has hitherto been = 
sumed” (p. 17). It should be noted, o 
ever, that the sample studied was compos 
of the sons of skilled workmen. wo 
Recently, Gough (3, 4) has developed jes 
scales for measuring different aspects of eel 
economic status. One is essentially a shor 
ened, more easily administered nara 
scales used to assess actual, objective sta Ts 
The other attempts to get at the individuae 
level of aspiration regardless of his ae fo 
status. These will henceforth be referrer ly. 
as objective and subjective status respective 


The Problem 


r 
The present study was designed to anm 
two principal questions: first, Dom rms 
all, do various interest groups differ in n : 
of socio-economic status, however Saag 
second, what are the relations amos ere 
various measures of status, all of which in 
designed to get at a common variable 
different ways? in the 
The subjects were all men students in yi 
junior classes of the colleges of Aris gai 
Sciences and Engineering at Vanderbilt o 
versity. Men were selected both Seca ale 
the generally better understanding of t is 
interests as well as for the fact that there 
i OL key on the women’s form 0 
trong. ` j 
All subjects completed the Strong Yoi 
tional Interest Blank as well as the two ai 
devised by Gough. Interest blanks ke 
scored for all thirty-nine occupational Ke9 
as well as for the three clinical keys, OL, ity: 
terest Maturity, and Masculinity-Femint ce 
Interest profiles were sorted in accordato 
with the method outlined by Darley (2) who 
primary interest groups. Those subjects for 
had no primary patterns were retaine 


vent” 
study as a separate group (N.P.). We the 
six cases, approximately ten per cent © x 


sample, had more 


ees than one primary: |“ aj) 
amination of the p 


rofiles showed that 1? 


Vocational Interests and Socio-Economic Status 


but four cases one primary might be con- 
sidered to be stronger or “more primary” 
than the other and was accordingly chosen. 
In the remaining cases, secondary and tertiary 
Patterns were inspected and a judgment made 
in favor of one or the other primary in terms 
of the total configuration. Those areas in 
which the subjects had primary patterns were 
as follows: Biological Sciences, Physical Sci- 
ences, Sub-technical, Social Welfare, Business 
Detail, Sales, and Verbal-Linguistic. 
After L, tests indicated homogeneous vari- 
ances, analyses of variance were made for 
each status measure across all interest groups. 
Toduct-moment correlations among the sta- 
tus measures were also computed. 


Results 


The results of the analyses of variance are 
Shown in Table 1. Of the three status meas- 
ures, only OL showed significant differences 
among interest groups. 

r © investigate further the situation with 
"egard to group differences on OL, tests of the 
ranificance of the differences between all 
neans were made. These are included in 
tp 2. While there were scattered signifi- 
in differences in several groups, the two 
ne which appeared to be most consist- 

y different were the Sub-technical and 
erbal-Linguistic. The OL scores of the 
ormer tended to be below average for the 
a sample while those of the latter were 
A Ove average. These results are in close 
Agreement with those reported by Strong (9) 


Table 1 


Analyses of Variance of Status Measures 
Across Interest Groups 


Status Measure 


Van 
ate Objective Subjective 
B =e OL Status Status 
ety, 
Within” 397.96 2275 22.83 
18.66 11.4 17 
To pose oid 
Fe n 416.62 34.23 31.00 
P 21.33 1.98 2.79 
<.001 >.05 >.05 


* 
D 
follo Stees of freedom for all three measures were as 
: for Between, 7; for Within, 244; Total, 251. 


337 


who correlated OL scores with scores on in- 
dividual scales. 

Finally, the correlations among the three 
measures were computed. They were as fol- 
lows: OL and subjective status, .07; OL and 
objective status, .10; objective and subjective 
status, — .03. None of these was statistically 
significant. Gough (3) reported a correla- 
tion of .52 between his two scales in a sample 
of high school seniors. 


Discussion 


From the foregoing, it must be concluded 
that at least for the present sample there is 
independence among the three status meas- 
ures and only significant differentiation among 
interest groups in the case of OL. Several 
possibilities may account for these findings. 

In the first place, only OL was specifically 
built to differentiate among occupational 
groups, but even it was not directly related 
to specific interest groups or occupations. 
Yet Barnett eż al. (1, p. 13) say that “The 
OL scores may be hypothesized as reflecting 
the individual’s socio-economic goals in life.” 
Further, on p. 17, they say, “The OL score 
is so constructed that it should indicate the 
socio-economic level of an individual's inter- 
ests.” In many ways, the development of 
OL was quite similar to that of subjective 
status; both involve self-descriptions about 
preferences for activities, reactions, feelings, 
etc. 

Another possibility lies in the nature of the 
sample which in the present case was drawn 
from students attending a private, fairly ex- 
pensive, above average socio-economic status 
university. These men were for the most 
part preparing for jobs in the professions or 
in business management. This is in direct 
contrast to the sample used by Stewart (1) 
described above. There may have been a 
ceiling effect operating to restrict the range 
near the upper limit. Yet if this were the 
case, such an effect should presumably have 
operated on the other scales in the same way 
as on OL which did not happen. 

It may be that OL is a more specific-to- 
occupations kind of measure than the other 
two scales. This should be studied, but the 
manner of development of all three makes 


338 John W. Gustad 


Table 2 


Mean Differences on Occupational Level for All Interest Groups 


Interest Group 


Nat. Sub.- Soc. Bus. Verb. P. 
roe Meas N Sa tek ME PE aja Ling. x 
io. Sci 54.73 26 1.53 7.04**  2.88** 143  —1.58 — 5.83** is 
ne Sei 53.20 24 S5I** 135s — 103.11" — 7.36 or 
sub. Tech 47.69 54 —4.16** —5.61** —8,62** 12.87 2.70" 
Soc, Wel. 51.85 20 T145 446 — 871 -ai 
un Det. 53.30 23 -3o = rae" 125 
Sales 56.31 49 = #5 aire 
Verb.-Ling. 60.56 9 
NP. 54.55 47 
252 


* Denotes significant at or beyond the .05 level. 
** Denotes significant at or beyond the .01 level. 


this appear unlikely. The study cited above 
(1) is again pertinent. Gough’s objective 
status scale probably gives greatest weight 
to factors contributed by the father. If 
Stewart’s results may be accepted, the inter- 
est group in which the individual is finally 
found is more a function of maternal in- 
fluence. 

What is probably needed is more work on 
the nature and dimensions of vocational in- 
terests as well as on socio-economic status. 
There are some contingencies, for instance, 
which should be considered, An individual 
from a high status home might have what is 
for him a low status score and yet still be 
average or above. Similarly, another person 
from a low status home might have what for 


him is a very high status score and he too 
might be average, 


Conclusions 


1. Of the three status me 
only OL differentiated si 
the interest groups. 

2. Study of the mean differences with re- 
spect to OL showed that those individuals 
with Sub-technical interests tended to have 
below average OL scores while those with 


Verbal-Linguistic interests tended to be above 
average on OL. 


asures studied, 
ignificantly among 


n ae jon 
3. There was no significant correlat 


z S- 
among the three status measures in the pre 
ent sample, 


Received September 25, 1953. 


References 


1. Barnett, G. Ja, Handelsman, I., Stewart, ge 
and Super, D. E. The Occupational gry 
scale as a measure of drive. Psychol. Mone 
1952, 66, No. 10 (Whole No. 342). tion 

2. Darley, J. G. Clinical aspects and interpre ey 
of the Strong Vocational Interest Blank. 
York: Psychological Corporation, 1941. tory: 

3. Gough, H. G. A short social status inven 

J. educ. Psychol, 1949, 40, 52-56. De- 

- Gough, H. G. A new dimension of status: L ciol. 
velopment of a personality scale. Amer. 5? 
Rev., 1948, 13, 401-409. trong 

5. Gustad, J. W, Academic achievement and S! hols 
Occupational Level scores. J. appl. Psy¢ 
1952, 36, 75-78, of 

- Kendall, W. E. The Occupational Level een. 
the Strong Vocational Interest Blank for 

J. appl. Psychol., 1947, 31, 283-287. Voca- 

7. Ostrom, S. R. The OL key of the Strong Jastic 
tional Interest Blank for Men and ae 
success at the college freshman level. J+ 
Psychol., 1949, 33, 51-54, test 

8. Ostrom, S, R. The OL key of the Stone ph 
and drive at the twelfth grade level. J- 
Psychol., 1949, 33, 241-248, ts of 

9. Strong, E. K., Jr, The vocational interes iver- 
men and women. Stanford: Stanford Un 
sity Press, 1943. 


= 


—_— 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 5, 1954 s 


Permanence of Interests and Interest Maturity * 


Kalmer E. Stordahl 


Arkansas Polytechnic College, Russellville, Arkansas 


Many counselors in working with college 
and precollege youth use Strong’s Vocational 
Interest Blank. In using this blank, or any 
other interest inventory, they are concerned 
With the problem of the permanence of scores 
obtained on the blank. The Strong blank 
has a scale, Interest Maturity, which is used 

Yy many counselors as a measure of the prob- 
able stability of a counselee’s interest profile. 
hey assume a positive relationship between 
Stability of interests and Interest Maturity 
Score, There is, however, very little or no 
evidence to support this assumption. For an 
account of how the Interest Maturity scale 
Was constructed and the evidence for its re- 
ationship to permanence of interests, see 
trong (4). The present study was designed 
to test whether or not scores on the Interest 
Taturity scale are related to interest stability. 


Method 


f In 1949 the Vocational Interest Blank was of- 
ae on an optional basis to all high school 
eniors who participated in the state-wide testing 
Program in the State of Minnesota. Approxi- 
Mately 3500 senior boys completed the blank. 
th ese completed blanks were made available to 
roe, investigator by the Student Counseling Bu- 
eau at the University of Minnesota. i 
ae check was made of the University of Minne- 
Ge 4 enrollment in 1951 to determine how many 
fo, these boys were enrolled at that time. It was 
Ra that 331 boys who had completed the 
of tre in 1949 were enrolled. A sample of 206 
co hese boys was contacted and asked to again 
con blete the Strong blank; 182, 88 per cent, 
Mplied with this request. One subject omitted 
th: Umber of items making his blank unusable so 
; at tests and retests for 181 subjects were used 


S e study. 
Was e minimum time between test and retest 
VE years and the maximum time did not 


Jects d 2.5 years. The mean age of the 181 sub- 
Th at the time of the retest was 19.8 years. 

€ tests and retests for the 181 subjects were 
pp: 
thet tis Paper is based upon a portion of a Ph.D. 
Versity Pmitted to the graduate faculty of the Uni- 
know act Minnesota. The author wishes to ac- 
Dugaldge the guidance of his advisor, Dr. Willis E. 


scored for forty-four occupational scales and for 
Interest Maturity, Occupational Level and Mas- 
culinity-Femininity. To determine whether In- 
terest Maturity score was related to stability of 
interests, some measure of stability for the indi- 
vidual was needed. Kendall's (2) coefficient of 
concordance, W, was used for this purpose. The 
coefficient W is based on the method of ranks. 
It is related to Spearman’s rho but has the ad- 
vantage of being appropriate for any number of 
observations, whereas rho is applicable only to 
two sets of data. In the present study, rho 
would also have been appropriate as only two 
sets of data were used. 

Coefficients of concordance were computed be- 
tween each individual’s test and retest profile. 
The forty-four occupational scales were used in 
computing this coefficient. 

The subjects’ interest profiles were arbitrarily 
divided into three groups of approximately equal 
size on the basis of their W values. Those with 
coefficients of concordance between test and re- 
test of .906 to .977 were designated as a “high” 
stability group (N= 60), those with concord- 
ance values of .820 to .905 were designated as an 
“average” stability group (N= 61), and those 
with concordance values of .419 to .818 were 
designated as a “low” stability group (N = 60). 
The Interest Maturity scores of these three 
groups on the first test, i.e., the 1949 test, were 
then compared. 


Results 


The coefficients of concordance are of in- 
terest themselves as a measure of the sta- 
bility of individual Strong profiles. The 
range of coefficients was from .42 to .98 with 
a median of .87. All but fifteen of the 181 
coefficients were significantly greater than 
zero at the .01 level. Since W has a direct 
relationship to Spearman’s rho, these figures 
can also be expressed in terms of rho. The 
median rho would be .74. 

The means and standard deviations of the 
Interest Maturity scores for the “high,” 
“average,” and “low” interest stability groups 
are given in Table 1. Bartlett’s test for 
homogeneity of variance indicated that’ the 
variances were homogeneous (P > .05). An 
analysis of variance of the Interest Maturity 
scores (Table 2) showed that no significant 


339 


340 Kalmer E. 


Table 1 


Meansand Standard Deviationsof the Interest Maturity 
Scores for Subjects with High, Average, and 
Low Coefficients of Concordance 


Stordahl 


Table 2 


ii r 
Analysis of Variance of Interest Maturity Spates fo 
Subjects with High, Average, and Low 
Coefficients of Concordance 


W value N Mean 


F P 
S.D. Source df SS MS F a 
a z 0S 
High (.906-.977) 60 46.7 8.8 Between 2 159.73 79.865 1.177 > 
oo (.820-.905) 61 48.6 7.0 Within 179 12142.09 67.833 
Low (.419-.818) 60 46.5 8.9 ee ee 
Total 181 12301.82 
difference existed between the means of the Note: Farii 
three stability groups (P > .05) s 


Although the Interest Maturity scores on 
the first test did not differ for the three sta- 
bility groups, the mean Interest Maturity 
score for the entire sample of 181 increased 
from 47.2 on the first test to 52.0 on the re- 
test. This increase was significant at the .01 
level. 

These results fail to substantiate the as- 
sumption of a positive relationship between 
interest stability and Interest Maturity score, 
From this, one may conclude that the pres- 
ent Interest Maturity scale is not useful as 
a means of estimating the probable stability 
of a precollege male’s interest profile. More 
useful would be a key built by contrasting 
the responses of persons whose interests re- 
main stable with the Tesponses of persons 


whose interests do not remain stable over a 
period of time.2 


Summary 

A sample of 181 males w 
Strong’s Vocational In 
school seniors were ret 
as college students, 


ho had completed 
terest Blank as high 
ested two years later 
Using Kendall’s coef- 


? The Student Counselin 


t g Bureau at the Univer- 
sity of Minnesota has be 


gun work on such a key. 


n jance: 
tt’s test for homogeneity of varia 


chi square = 4.17; P > .05. 


ficient of concordance, W, as a measure A 
the relationship between the test-retest pin 
files, coefficients were computed for each in- 
the 181 pairs of profiles. When those i j 
dividuals with high (N = 60), average a 
= 61), and low (N = 60) W values were co a 
pared with respect to Interest Maturity ite 
on the first test, they were found to be ee 
geneous. Thus, the results of this study 18 
not support the assumption of a positive 


Rae ae i n- 
lationship between interest stability and 1 
terest Maturity score, 


Received October 1, 1953, 


References 


ch. 

1. Johnson, P, O. Statistical methods in researe 
New York: Prentice-Hall, 1949. tis- 

* Kendall, M. G. The advanced theory of $4 


es 
tics, Vol. I. (4th Ed.) London: Cha"! 
Griffin & Co., 1948, 


3. Stordahl, K. E. The stability of Strong Voc 


e 
tional Interest Blank patterns for pre-colrg, 
males, Unpublished doctor’s dissertation, 
versity of Minnesota Library, 1953. 

4. Strong, E, K., Jr. 


; men 
Vocational interests of 


press 


and women, Stanford: Stanford Univer. 


1943 


Port, 


THE JOURNAL or APPLIED Psy cY 
Vol. 38, No.5. i ea PsycHoLocy 


im 


The Strong Vocational Interest Blank and College Achievement 


Ralph M. Rust and F. J. Ryan 


Division of Psychiatry and Mental Hygiene, Department of University Health, 
Yale University 


An awareness that prediction of college 
grades by means of “intellectual” factors had 
reached a point of diminishing returns stimu- 
lated extensive efforts to explore the predic- 
tive value of other personality variables. A 
Number of authors (2, 3, 11) have reviewed 
these attempts. Thus far, the coefficient of 
alienation left by the present predictors has 
been little reduced. 
ae, Strong Vocational Interest Blank for 
“Men (SVIB), because of the reliability of 
its scales and its wide usage, has been in- 
Cluded in much of the research on academic 
achievement. Most of these studies have 
en summarized by Strong (10). A di- 
versity of designs, definitions of achievement, 
and instruments, often along with methodo- 
Ogical deficiencies, renders interpretation of 
Tesults difficult. It does appear, however, 
oe keys for the Strong (e.g., 12, 13) have 
een developed which can add to the predic- 
= of college grades afforded by intelligence 
fi, scales. Yet, such scales fell into disuse 

rough their failure to add to a predictive bat- 
ery which includes secondary school grades. 

In the authors’ (8, 9) preliminary investi- 
8ation of personality variables associated with 
academic achievement, the different definition 
it achievement used appeared to warrant a 
S “examination of this factor’s relationship to 
VIB. 


Pe present, the best available estimate of 
i incoming freshman’s grades at Yale Col- 
Se is his “general predicted score” (1). 
m ta easure is the dependent variable in a 
Ultiple regression equation of which the 
oe independent variables are: (1) adjusted 
a school record; (2) Scholastic Apti- 
thr Test score (SAT); and (3) the total of 
(Ce eee Entrance Board Examinations 
ter “EB). Achievement was measured in 
the of deviation from predicted score. 
ha unlike most previous studies, the in- 
ier was concerned with achievement 
a nd that predicted by a battery which in- 
€s secondary school record. A recently re- 

€d study by Melville and Frederiksen (5) 


on freshman engineering students at Prince- 
ton also used adjusted secondary school grades 
as one component of the predicted score. 

Though the results yielded by the first ex- 
perimental groups (Yale College classes of 
1950 and 1951) gave little promise that the 
Strong would have practical prediction value 
for academic achievement, the scales showed 
more than a chance relationship to achieve- 
ment status. Further, the significant results 
obtained for Group V scales and the Mascu- 
linity-Femininity (M-F) scale appeared to 
offer indirect support for a hypothesis de- 
veloped earlier (8, 9). For these reasons, 
SVIB was included in a battery administered 
to three other sets of experimental groups. 
Data were available for an additional group. 
These extensive replications permit an im- 
proved appraisal of SVIB as it relates to col- 
lege academic achievement. 


Subjects 


Selection of subjects was based on the rela- 
tionship between grades and predicted scores. 
The procedure, described elsewhere in greater 
detail (8, 9) was designed to yield three 
groups of Yale undergraduates who would be 
equated for predicted score, but who would 
differ widely in grades. The regression line 
of grades on predictions was drawn on a 
scattergram. Lines parallel to the regression 
line were drawn so as to cut off approximately 
the most extreme ten per cent of both the 
positive deviants from predicted score, over- 
achievers (O's), and the negative deviants, 
underachievers (U’s). A third group, normal- 
achievers (N’s), included those students in cells 
cut by the regression line. This procedure 
was applied to each sample separately. Sub- 
jects included in four samples had accepted 
invitations to participate in the study and 
were tested in either the junior or senior year, 
In one sample, students were routinely ad- 
ministered the test during the freshman year.’ 

Table 1 indicates that the combined experi- 
mental groups do not differ in predicted scores 


1Strong scores for this group were obtained b 
J. R. Wittenborn. Pt 


342 Ralph M. Rust and F. J. Ryan 
Table 1 
Comparison of Groups on Prediction, Components of Prediction, and Average Grade 
Average 
School Grade Average 
Prediction Adjusted SAT CEEB Grade 2 
M SD M SD M SD M SD M SD T 
U 76.9 49 76.5 5.9 58.7 8.3 57.8 5.6 71.4 3.5 175 
N 77.1 49 76.5 5.8 59.0 8.2 575 65 779 32 165 
oO 774 49 77.2 3.6 59.8 8.7 57.7 6.1 84.3 3.4 483 
T TL 5.0 76.7 53.8 59.2 84 57.6 6.1 18.2 6.1 
CRoun 3 a 3 5 17.1 
CRuo 8 1.0 1.2 2 32.2 
CRwo 6 12 9 2 18.3 


or components of predicted scores, but differ 
significantly in academic average. 


Results 


The ability of SVIB occupational scales to 
separate the experimental groups is shown in 
Table 2. Comparisons among groups were 
made by means of chi-square with the cut- 
off point for each scale taken at the median 
of the total group. None of the 44 scales 
differentiated U’s from N’s, Overachievers 
differ significantly (p < 05) from U’s on 11 
scales, and from N’s on 12 scales, nine of 
These significant 
erachievers score 
Ips on scales for 
ool Superintend- 
ent, Minister, Musician and C.P.A, They 
s on Mathematician, 


Overachievers score 
lowest on Sales Manager, Real Estate Sales- 


A key? for each achi 
developed from item anal 
Uncorrected odd-even re 
the U key, .38 for the N key, and .42 for the 
O key. Results of the application of these 
keys to the remaining three samples combined 
are shown in Table 3. Results of subtracting 
each subject’s U score from O score (O-U 


evement group was 
yses of two samples. 
liabilities are .43 for 


6X8 in. photocopies. 


Make checks payable to 
Chief, Photoduplication S 


ervice, Library of Congress, 


Score) are also shown. The U key and et, 
O-U score yield significant differences betwe 0 
O’s and U’s, and O’s and N’s, whereas the 
key differentiates only O’s from U’s. ons 
key produces no significant differences. per 
of the keys yields significant differences 
tween U’s and N’s, 

measure of the congruency bewe 
stated occupation choice and Strong me 
Was available for 265 subjects, The Pi 
centage of each group receiving A scores is: 
the Strong in their occupational CHOSE, he 
U's, 36.7; N’s, 34.3; and O’s, 32.2. nt. 
differences among groups are not significa 


Discussion is 
Though a clear interpretation of oa 
hampered by the empirical nature of the he 
strument, two aspects compel attention. iis 
Occupational scales apparently ipa 
among the achievement groups with ss e 
than chance frequency and consistency., 
Scoring keys, empirically developed from es 
samples, separated the remaining et 
with statistical Significance. Both of th e- 
events can be viewed as evidence that ahe 
ment as measured in this study is not 0 
artifact produced by the unreliabilities Bs 
either the predictors or grades, Further, i 
dence is supplied that there is a relations 


etween achievement status and responses 
the Strong items, 


The ability of 
low reliability to 
groups offers some 
velopmi 


o 
res i S 
empirical scoring ml 
Separate the experi™” o- 
Promise for eventua pif 
ent of keys which would add sig 

3 Since scale scores 
ent events, estimates 
be approximately 


. depend 
cannot be treated as indep aiy 


of chance expectancy C3 
determined, 


pr 


pa O S OO e 


Strong Vocational Interest Blank and College Achievement 343 


Table 2 


Achievement Status and Strong Vocational Interest Blank Scores 


Per Cent above Median 


Under- Normal- Over- Chi-Squares Significant at .05 Level 
achievers achievers achievers 
Scale N=139 N=175 N=166 Uvs.N Uvs.O Nvs.O 
Artist 46.0 434 60.8 — (3-2) 6.67 (5-0)! 10.35(4*-1) 
Psychologist 43.9 40.6 614 — (3-2) 9.38 (5*-0) 14.85(5*-0) 
Architect 53.2 46.3 53.6 — (3*2) — (3-2) — (41) 
Physician 496 463 53.6 —@) — 8-2) — (41) 
Dentist 43.8 46.9 53.6 — (2-3) — (4) — (#1) 
Group 12 465 487 54 = 3-1) — (40) = (41) 
Mathematician 48.9 42.3 57.0 — (41) — (41) 7.33 (5*-0) 
Engineer 54.7 48.6 44.0 — (4*-1) — (5-0) — (3-2) 
Chemist 50.4 48.0 50.0 — (41) — (3-2) — (41) 
Group TE 465 419 57.9 — G1) — 61) 6.39 (3*-1) 
Prod. Mgr. 34 s4 46 =e =D — 32) 
Aviator 35.3 52.1 42.1 =) £.26(4-0) = 3) 
Farmer? 53.5 53.0 42.1 — (1-3) — (3-1) — (8-1) 
Carpenter 50.9 50.9 40.6 — (2-2) — (3*-1) — (BH) 
Printer? 31.8 49.6 49.6 — (2-2) — (2-2) — (2-2) 
Math.-Sci. ‘Teacher 55.4 474 48.8 — (41) = (1) — (3-2) 
oliceman? 51.8 52.1 46.6 — (2-2) = (Bt) — (3-1) 
Forest Service? 56.1 504 8.6 — (4-0) 3.84(4*-0) — (2*-2) 
YMCA Phys. Dir. 46.0 46.3 53.6 — (3-2) — (41) — (3*2) 
'ersonne] Dir, 47.5 49.7 51.8 — (3-2) — (3-2) — (4) 
YMCA Secretary 153 74 538.9 — (3-2) — 1) — (2-3) 
ocial-Sci, Teacher 43.8 48.0 54.8 — (3*-2) — (5*-0) — (2*-3) 
City School Supt. 41.7 46.3 61.4 — (4*-1) 11.80(5**-0) 7.87(5*-0) 
Minister 45.3 45.7 60.8 — (2-3) 6.67 (5-0) 7.83 (4*-1) 
toup V 43.2 50.3 54.2 — (4-1) — (41) — (2*-3) 
Musician 46.0 46.9 57.8 — (2-3) 4.22(5-0) 4114-1) 
CPA, 16.0 Ba 50.4 — (2-3) 5.42(3.5"-1.5)  8.668*™-2) 
Accountant 16.0 53.1 524 = 3-2) — (41) — (2-3) 
ice Worker 50.4 52.0 47.0 — (41") — (2.5-2.5) — (41) 
“chasing Agent 49.6 52.6 46.4 — (3-2) — W = 4-1) 
anker? 475 56.4 47.4 — 8-1) — (1.5-2.5) — (3+1) 
Group vig 51852448. = 6-1") — (2-2) — (3-1) 
Sales Manager s4 554 H0 Zea 39560) 447 (E1) 
“al Estate Salesman 54.7 57.7 42.8 — (3-2) 4.29 (5-0) 7.61 (5**-0) 
cife Insur, Salesman 50.4 49.1 45.8 — (2-3) — (41) — (3-2) 
roup TX? 49.1 53.0 41.4 — (3-1) — (40) — Gel) 
Advertising Man 48.2 47.4 52.4 — (3-2) — (3*-2) — (2*3) 
awyer 468 48.6 57.8 = (3*2) — (32) — (4*1) 
“thor-Journalist 48.2 474578 — r) =Q p 
= a0 ao š — (3-1) — (2-2) 429(2*-2) 
gaine 11 52.1 46.6 — (2-2) — @-1) = (3-1) 
Mo Pational Lever 48.2 56.0 59.1 = E1 =F) — (2*-2) 
aa Rem. 56.8 57.7 40.1 — (3-2*) 8.23 (4*-1) 10.26 (5*-0) 
crest Maturity? 43.4 48.7 55.6 — (3-1*) — (40) — (1*-3) 


ber of times the direction of the results of the samples was 


UT mi 
thi he first number i theses gives the nu i 
e r in paren gr d number indicates reversals. Asterisks are added for each 


Same ‘ 
Sam, > as that of the combined group. The secon 
ple Significant at the .05 level ie Ol level or better. 
3x°™parisons based on four samples. Number of subjects: U = 114; N = 117; O = 133. 
Oormalachievers exceed underachievers in three samples. 


344 


Ralph M. Rust and F. J. Ryan 


Table 3 


Application of Achievement Keys to a New Sample 


Per Cent above Median 


Chi-Squares Significant 


Under- Normal- Over- at the .05 Level 

iev hievers achievers o 

eee See Te E 

U key 59.3 31.9 36.7 = 9.37 BL 

N xi 50.0 49.4 39.8 — aye Z- 

O key 38.4 49.4 56.1 = 5. a 

O-U score 41.9 45.7 65.3 = 10.16 g i 
x 3 : tten- 
cantly to the present predictors. However, The Group V scales merit special a 


the wide variation in the nature of the items 
comprising the key points to the difficulty of 
identifying variables by this approach. 

Original impetus for the inclusion of the 
Strong in the present diagnostic battery came 
from its earlier apparent support of a hy- 
pothesis developed elsewhere (8, 9). Stated 
briefly, it was hypothesized that the extent to 
which behavior favorable to high grades will 
Persist at the college level will be a function 
of the degree to which certain moral and cul- 
tural values have been internal 
tive deviation from predicted 
directly related to the pheno 
labelled “superego,” “conscience,” “moral 
fiber,” “goodness,” etc. 

Table 2 shows that in V, 
“goodness” group, 
bined, on no scale 
the median score, 


ized—i.e., posi- 
Scores will be 
mena variously 


the so-called 
when the samples are com- 
do half of the U’s exceed 
Correspondingly, more 
than half of the O’s exceed the median on all 
Group V scales, Further, on three of the 
scales, the O's exceed the U’s in all five 
samples. These findings, along with the in- 
cidence of statistical significances, indicate a 
relationship of Group V scales to academic 
achievement. This relationship was also 
found by Melville and Frederiksen (5) and 
by Morgan (6). Though the results are in 
the direction which the hypothesis would pre- 
dict, it is still difficult to gauge the amount of 
support given to it. The instrument is em- 
pirical and any argument of support for the 
hypothesis must obviously involve a certain 
amount of tenuous reasoning. 

The earlier finding that O’s obtain lower 
M-F scores had been viewed as Possible sup- 
port for the hypothesis. This finding was 
corroborated by results yielded by addi- 
tional samples. But, again, the difficulties in 
interpretation outlined above prevail, 


tion because of their possible measuremet™ 
of a variable hypothesized to be related a- 
achievement. Nevertheless, other ou 
tional groups show similar discriminar 3 
ability. In addition to Group V, O's hse 
highest on occupational Groups I and re 
also on the scales for Musician and C nal 
Overachievers score lowest on asa anie 
groups IV and IX. These findings rg 
do not bear on the authors? eur 
pothesis. It does seem, however, that Te 
Scores on occupations requiring extens to 
academic training are positively related 
achievement, en 
Considerable agreement is found HNE 
our results and those of Melville and Fre n 
tiksen (5), The lack of greater ae 
(especially on groups IT and IV) may n 
due to differences in subjects. Melville ai 
Frederiksen tested freshmen engineering $ ' 
dents whereas the bulk of the present study s 
subjects were liberal arts students. Rener 
there are some Personality or interest facto" 
related to general academic achievement a" 
others related to Specific achievement. si- 
Kendall (4) and Ostrom (7) found a po 
tive relationship between achievement ie 
O-L scores. The differences in design de 
tween these Studies and the present one ren he 
direct comparisons dificult. Nevertheless: r 
results of the present study are in the = 
direction as those obtained by these a 
Common among educators is the assumP” 
tion that many students fail to achieve na 
cause of ą disparity between oceupatt o 
aims and measured interests, The results ° 
this study fail to show that achievement 
related to the congruency of occupati? 
choice and scale scores, Morgan (6), ee 
somewhat different criteria of achieve? 


> E 


Strong Vocational Interest Blank and College Achievement 


found a negative relationship between such 
Congruency and achievement. 

A comparison among the experimental 
a in the combined samples yields 23 
ae which are significant at less than 
ae 05 level of confidence. Yet, none of 

BSS differences is obtained between U’s and 
he = the empirical achievement 
oth though able to distinguish the O’s from 
the hee groups, are unable to separate 
( s from the N’s. This result may be: 
a) an artifact produced by the instrument; 
aie Lee to the curvilinear nature of the vari- 
Sopa rested to achievement; or (c) pro- 
a Dre sd the academic structure which places 
ian mium on overachievement while elimi- 

ing the extreme underachievers. 

Stina Suggestion of a curvilinear relationship 

disor achievement and some variables has 

ded ant implications for experimental de- 

in i A large portion of published findings 

achie is area is based on two contrasted 

ib vement groups. This implicit assump- 
of linearity may be unjustified. 


T Summary 
ie Strong Vocational Interest Bank for 
ticia E] „was administered to three 
achiey of subjects (designated as under- 
ers) iin normalachievers and overachiev- 
Score = were equated for general predicted 
ent, ut who differed in academic achieve- 
Cupat; The blanks were scored for all oc- 
Were Ons and comparisons among the groups 
hig, Made by means of the chi-square tech- 


i 

u E . 

a Empirical scoring keys were de- 
ag Ped from two samples and cross-validated 


{st three other samples. 
he incidence of significant results ob- 
e a both the occupational scales and 
Stratig pirical keys was viewed as a demon- 
n of some relationship between achieve- 
paus and response to the Strong items. 
V scal os discriminatory ability of the Group 
€s was regarded as lending possible sup- 
Predict, the hypothesis that deviation from 
deseri ed grades is associated with a variable 
Certain 73 acceptance of or conformity to 
3. 7, Cultural values. 
high) A general, the Strong does not seem 
4 Ppropriate for the measurement of the 


tain 
e 
t d 


ed 


alg, Re A 
ah ingi@tthor’s results with the Rorschach (9) 
resented ieee relationship similar to 


345 


theoretical variable specified in the hypothesis. 

4. Congruency between stated occupational 
aims and interests as measured by the Strong 
does not appear to be related to academic 
achievement. 

5. Scale scores do not show a linear rela- 
tionship with achievement; the overachievers 
appear to be the discrete group. 

Received October 5, 1953. 


References 


1. Crawford, A, B. and Burnham, P. S. Forecast- 
ing college achievement. New Haven: Yale 
Univ. Press, 1946. 

2. Garrett, H. F. A review and interpretation of 
investigations of factors related to scholastic 
success in colleges of arts and sciences and to 
teachers’ colleges. J. exp. Educ., 1949-1950, 
18, 91-138. 

3. Harris, D. Factors affecting college grades: a 
review of the literature, 1930-1937. Psychol. 
Bull., 1940, 37, 125-166. 

4. Kendall, W. E. The occupational level scale of 

the Strong Vocational Interest Blank for Men. 
J. appl. Psychol., 1947, 31, 283-287. 

. Melville, S. D. and Frederiksen, N. Achieve- 
ment of freshman engineering students and the 
Strong Vocational Interest Blank. J. appl. 
Psychol., 1952, 36, 169-173. 

6. Morgan, H. H. A psychometric comparison of 
achieving and nonachieving college students of 
high ability. J. consult. Psychol., 1952, 16, 
292-298. 

7. Ostrom, S. R. The OL key of the Strong Voca- 
tional Interest Blank for Men and scholastic 
success at college freshmen level. J. appl. 
Psychol., 1949, 33, 51-54. 

8. Rust, R. M. and Ryan, F. J. The relationship 
of some Rorschach variables to academic be- 
havior. J. Pers., 1953, 21, 441-456. 

9. Ryan, F. J. Personality differences between un- 
der- and over-achievers in college. Ph.D. 
thesis, 1951, Columbia University. Univer- 
sity Microfilms, Ann Arbor, Mich., Publ. No. 
2857. 

10. Strong, E. K., Jr. 
and women. Stanford University: 
Univ. Press, 1943. 

11. Travers, R. N. W. Significant research on the 
prediction of academic success. In Donahue, 
W. T., Coombs, C. H., and Travers, R. N. W. 
(Ed.). The measurement of student adjust- 
ment and achievement. Ann Arbor: Univ. of 
Michigan Press, 1949. Pp. 147-190. 

12. Young, C. W. and Estabrooks, G. H. Scale for 
Measuring Studiousness by means of the 
Strong Vocational Interest Blank for Men. 
Stanford University: Stanford Univ. Press, 1936. 

13. Young, C. W. and Estabrooks, G. H. Report 
on the Young-Estabrooks Studiousness Scale 
for use with the Strong Vocational Interest 
Blank for Men. J. educ. Psychol., 1937, 28, 

176-187. 


wr 


Vocational interests of men 
Stanford 


‘fue JOURNAL oF APPLIED PsycuoLocy 
Vol. 38, No. 5, 1954 


Long-Term Validity of the Strong Interest Test in Two 
Subcultures 


Charles McArthur 


Department of Hygiene, Harvard University * 


Surprisingly few long-term follow-ups have 
been made on the Strong Vocational Interest 
Blank, when one considers that the test has 
now been in use two decades. The largest 
studies are nine and ten year follow-ups re- 
ported in Strong’s original volume (21), which 
are later supplemented by a twenty year re- 
port (22) on the same group. Unfortunately, 
as Super remarks (24), “the data are not so 
organized as to show what percentage of men 
entered and remained in fields in which they 
made A, B+, or lower scores.” Instead, 
Strong (21) adduces support of four rather 
indirect propositions: 


1. Men continuing in occupation A obtain 
a higher interest score in A than in any other 
occupation. 

2. Men continuing in occupation A obtain 
a higher interest score in A than other men 
entering other occupations, 

3. Men continuing in Occupation A obtain 
higher scores in A than men who change from 
A to another occupation. 

4. Men changing from Occupation A to oc- 
cupation B score higher in B Prior to the 

ange than in any other occupation, includ- 
ing A, 

A special twen 
(23) dealt with medical intere 
was reported in 


Of 
108 Stanford alumni w 


ho were physicians 


men who made careers as 


“high” physician score when tested in college, 


Procedure 


The Sample. 
the Study of Adult 


dult Development (the 
of Hygiene, Harvard 


346 


r $ he 
cational Interest Blank by Dr. F. L. Wells m 
academic year 1939-1940. These yoong, fae 
were part of a longer series selected pS their 
disciplinary long-term study on the basis 0 time 
apparent “normality.” All were at tna) has 
sophomores in Harvard College. Heath ( 
described the original program in detail. i 

This Study probably has the lowest 72 ram. 
drop-outs of any existing longitudinal poenl 
Of the 63 men given the Strong in S farther 
one has requested to be excused from h 
participation; all the rest are in close I nu- 
happens that the drop-out can be used known 
merical summaries, since his occupation 15 © ove 
from perfectly public sources. Two men 
lost during World War II, however. t 

We have, then, 61 cases on which to te 
predictive power of the Strong over a fo 
year interval, from 1939 to 1953. 

SVIB as a Predictor. How well did the 
taken in college predict the occupations “retailed 
men fourteen years later? The basic € en } 
data for answering this question are giv rrent 
Table 14 Reported in Table 1 is the as re 
Job-title and the name of the Strong sc@ Most 
garded as falling nearest to that job-title. semi- 
selections are self-explanatory. One wins cond” 
empirical; there being no scale for applied “gmi? 
mists, it turned out that Office Man often 
nearest, Disguises occur but only in the indi 
of generalizing the job-title to make it less s of 
vidually identifiable. The last two colur he 
Table 1 are mildly subjective evaluations yai 
Investigator, It seemed necessary tO kie ap 
whether or not a scale offered a “Direct” © 


jon 
“Indirect” measure of interest in the ans ag 
entered. The indirect measures are 0 ctice 
fair test at all 


+ a4 in pra 
» Yet a counselor might in PP ice 
e forced to make just this sort of in Casses 
(eg. using the Author-Journalist scale to e lack 
the advisability of teaching Drama) for t 


ase 
of other evidence. In the last column, @ „de 
sessment of the Correctness of prediction 15, and 
m terms of “Goo 
‘ 


f Hits,” “Poor Hits, ns 
Clean Misses,” The definitions of these tpe 
are implicit in th 
a 


e of 


the 
teen 


Strong 


e claims made by Strons? 


der 

1 To reduce Printing costs, Table 1 has bee? ye: 
posited with the American Documentation asia 
prd Document No. 4325 eee vce, E 
Publications Project, Photoduplication Servi" sing 
brary of Congress, Washington 25, D. C. 1755 fof 
in advance $1.25 for 35 mm. microfilm or $1.2° to 
: 8 in. photocopies, Make checks paya ngre 
Chief, Photoduplication Service, Library of Go 


| 


| 
| 


oa 


Long-Term Validity of Strong Interest Test 


Table 2 


Fourteen Year Validation: Strong Vocational 
Interest Blank 


Validity Direct Indirect Total 
Good Hit 22 5 27 
Poor Hit 7 5 12 
Clean Miss 14 7 21 
Total 43 17 60 


as that a good hit may be counted when a 
or n enters an occupation for which he scored A 
i Which had the ist, 2nd, or 3rd highest rank- 
8 Score on his test. Less credence is given to 
SS + score when it is outranked by many others, 
tain such Scores are usually regarded as “worth 
eee Consideration” in counseling. They are 
ttle called “Poor Hits.” Anything below these 
eria is taken to be a “Clean Miss.” 


‘oe cases could be used for validation, 
eh man (No. 63) being in an occupation for 
‘ch no scoring scale seemed even indirectly 
tiotinent, It becomes apparent by inspec- 
thr of Table 2 that some accuracy is lost 
fn the necessity of using indirect meas- 
= The fairest evaluation of the Strong’s 
tne, Ictive power may be had from the 43 
n whose occupations can be directly tested. 
these, only one-third are Clean Misses. 
t half were hit well. 
Riven be figures are slightly lower than those 
intere Y Strong in his follow-up of medical 
tumed y There, about one out of four tests 
st out to be complete misses. Yet one 
Unde remain pleased with an instrument that 
ee blind conditions” (these tests were all 
ven red until 1952) predicts future behavior 
half the time. 


Jus 


Strong’s First Proposition 


to tad a counselor used these tests, in 1939, 
Voca Bgest to the boys their likeliest future 
leading” he would have been downright mis- 
Yet Po Only once in every three attempts. 
Sete, CN the “good” tests would have pre- 
ontaj him with a grave difficulty: the tests 
too es Mg accurate predictions also contain 
Mathe R “extraneous solutions.” Like a 
“oung matician solving a cubic equation, the 

Or must enter the problem with the 


347 


expectation that not all the answers offered 
will be real and pertinent. 

Whatever its letter rating, the scale most 
pertinent to future choice of occupation 
ranked anywhere from 1st to 33rd highest out 
of the 44 scales for which each test was 
scored. The median rank of the most perti- 
nent scale was 5th. That means that the 
counselor using these tests could have ex- 
pected, on the average, four “extraneous solu- 
tions” with higher-ranking scores than the 
true solution. It is, of course, true that the 
“extraneous” quality of certain high scores is 
obvious: few would counsel a tone-deaf boy 
to be a musician. 

Strong (21) states that “a college student 
who continues ten years in the same occupa- 
tion enters an occupation in which he ranks 
second or third best.” Like our group as a 
whole, our men who continued in the same 
occupation (not considering interruption by 
the war) entered occupations in which, on the 
median, they ranked fifth best. Once again, 
our figures are slightly less impressive than 
Strong’s. It is certainly not true that among 
our cases men “continuing in occupation A 
obtain a higher interest score in A than in 
any other occupation.” 


Strong’s Second Proposition 
The proposition that men engaged in an 
occupation score higher on that occupational 
Table 3 


Testing Strong’s Second Proposition 


Average Score Average Score 


of Men of All 
Engaged in Other 
Occupation Occupation Men 
Physician 42.3 32.8 
(N = 12) 
Lawyer 40.6 30.5 
(N = 11) 
Public Administrator 45.8 39.6 
(N = 5) 
Engineer 53.8 30.1 
(N=4) 
Chemist 45.0 33.2 
(N= 3) 
Minister 44.0 29.2 
(N = 2) 


348 


scale than all other men is well supported by 
our data. That is, doctors outscore controls 
on the Physician scale, lawyers outscore con- 
trols on the Law scale, etc. (Controls are 
simply all the rest of the 61 cases.) This is 
true for every directly scaled occupation that 
occurs more than once. 

Strong’s second proposition seems to be 
valid. 


Strong’s Last Two Propositions 


Seventeen of our sixty men have made 
changes in occupation other than shifts en- 
forced by entering the armed services. Often, 
these men abandoned two or more vocations 
before settling on the job they are engaged in 
today. Strong’s follow-up data showed that 
men who abandoned an occupation were likely 
to possess lower scores on that occupational 
scale than the scores made by men who con- 
tinued on the job. 

Table 4 tests that Proposition in our own 
figures. Strong found that rule to hold “ex- 
cept for the records of two individuals,” while 
we, except for one instance of tie, find it to 
be entirely so. 

Another generalization Strong offers about 
men who change vocational fields is that they 


Table 4 
Testing Strong’s Third Proposition 


Men 


fen Men 

Continuing Leaving 
Occupational Mean Mean 
Scale Score N Score N 
Physician 42.3 12 35.5 2 
Lawyer 40.6 11 33.0 3 
Public Admin, 45.8 5 425 4 
Author-Journalist 49.7 3 33.0 3 
(Teaching) 
Engineer 53.8 4 34.0 2 
Office Man 44.0 2 35.0 3 
Production Mgr. 30.0 2 210 2 
Pres. Mfg. Co. 25.3 3 34.0 1 
Physicist 42.5 2 340 2 
Chemist 45.0 3 18.0 1 
Author-Journalist 54.0 1 325 2 
(Writing) 

Minister 44.0 2 35.0 1 
Salesman 42.0 1 30.0 1 
Senior C.P.A, 43.0 1 43.0 1 


Charles McArthur 


will proceed from a field in which they have 
a low score into a field in which they score 
high. That was true of 9 of our changeable 
men, 7 men going contrary to their tests an 
entering new jobs for which their test scores 
were lower. (One man changed between jobs 
with identical scores.) These figures xu 
faintly in the right direction, probably looking 
even less convincing than the data from por 
Strong felt that proposition 4 was “almos 
but not quite sustained.” 


Contentment in Occupation 


t 

As Strong has pointed out (23), fies 
validity of an interest test should be meas 
ured in terms of satisfaction” but for this 
“there is no satisfactory measure.” ted 
Study of Adult Development has accumula 
much data on expressed satisfaction and oh 
satisfaction with occupational choice, UA 
the use of annual questionnaires. Even a 
heed the force of Murray’s (16) warning th 
one must draw conclusions from neat 
Sentiments, not from expressed sentiments, | 
may ask some operational questions about jn 
relations between expressed satisfaction 1s 
1953 and the interest score obtained 14 ye 
previously, ing 

The 1953 questionnaires were still oni 
in when this was written. Of the 60 er ait 
whom we are interested, 37 had returned oe 
questionnaires. There was, as a matte in 
fact, some tendency for the men engage. r- 
occupations for which they possessed a e 
able Strong score to return their queta 
naires early! (Three-quarters of them a 


“an lowe! 
done so, as against half the men with lo 


scores. For this Fisher’s “p” comes out nay 
: . o. . . : i ne 

This is not so trivial an indication as it yare 

appear; T 


the Study staff has long been # 
that among people who are hardest tO 


from are those who have a sense of not 
Ing succeeded, 


n 
, Several 1953 questions were pertinent oe 
inferrable sentiment of job satisfaction. gp- 
may be abbreviated as: (a) Are you are? 
templating a change in the near Ngo 
What Considerations entered this? ins? 
What extent has the job produced st!" pe 
(c) What Special even 


n 
ts have occurred ? your 
last year? (d) What is your outlook 0” ° 


neat 
nav” 


Long-Term Validity of Strong Interest Test 


Table 5 


Job Satisfaction and Strong Score 


Appar- Express 
ently Discon- 
Score Happy tent Total 
“A” on Strong 
Seale 14 3 17 
Lower scores on 
Strong scale 10 10 20 
Total 24 13 37 


Personal future? And what is the principal 
a of this? Not rarely, a participant 
“es use of the backs of the questionnaire 
ok to write us a letter in which discussions 
Job problems may be found. 
sh There were thirteen men, in all, who 
Owed some evidence of discontent, in an- 
SWer to one or another of the questions. 
hese thirteen, who are “less than completely 
Pr ” about their jobs, include dispropor- 
nately few who scored A on the Strong. 
ae 5 gives the figures. Fisher's “p” 
es out less than .05. 
dite he question about job strains is the only 
ten those contributing to this general 
that itself approaches significance. 
a the figures in Table 6 are not im- 
Ssive, “p” comes down to .08. 
he contributions of the other questions, 
eat t in the predicted direction, are too 
ex: in numbers to reach significance. (An 
ample: men now occupying the lower-rated 


Table 6 


Job Strains and Strong Score 


Rati No Strains Reported 
A ae Reported Strains Total 
or B 
Stron 
Rating s m 
as 2 
Lower 
Strong 
Rating 
8 5 13 
Tota oy ia r 
Pe 29 8 7 


349 


occupations are twice as frequently contem- 
plating a change.) 


Other Evidence 


These findings, though not so favorable to 
the test as Strong’s results, nonetheless sug- 
gest that the test has its usefulness. Further- 
more, someone familiar with the Study partici- 
pants cannot read through Table 1 without 
acquiring some feeling that, however inac- 
curate its predictions of behavior, the test 
is measuring interests. There is the evidence, 
for example, of the correlated pair of scores: 
Lawyer and Public Administrator. Some men 
enter the law because they have politics in 
mind. Cases 20, 25, and 27 are examples. 
In case 27, the Public Administrator score 
matches that for Lawyer. In case 25, the 
Lawyer score is low; the choice of lawyer 
would seem to have been contraindicated. 
That would have been correct. Case 25 
escapes being one of our dramatically un- 
happy group only because the practice of law 
is rationalized as a means to a political end. 
The Strong has measured the relative interest 
in law and politics quite accurately. Indeed, 
the suggestion of power motives given by the 
Strong is more than borne out by projective 
tests. (Case 24 is in sharp contrast. Though 
actually working for the government, this 
man is not interested in politics. That is 
what his Strong scores fourteen years ago 
predicted.) Some indication of the injustice 
of “occupations entered” as a criterion of in- 
terest may be had from case 20. In the table, 
this man is reported as a lawyer and his low- 
ish score on that scale makes him count in 
the validation as a “Poor Hit.” Yet he, too, 
intends to use law as a stepping-stone into 
politics, a fact that was not shown in the 
table, since circumstances have prevented his 
carrying out his plans. His score on Public 
Administrator is an A. That is also the scale 
on which he ranks first. 

One is impressed by the logic underlying 
the relative efficacy of the test in predicting 
well or poorly certain occupational choices. 
Engineers, ministers, and teachers seem to be 
highly predictable; all three are likely to 
choose their vocation in response to an inner 


“call,” By contrast, men who are in their 


350 


own business (which, for all three under that 
heading in Table 1, means an “externally pre- 
scribed” choice) the Strong simply does not 
predict. Another way of saying these facts 
would be to assume that the Strong tested in- 
terest and that the difference in prediction 
represented differences in the importance of 
interest as a factor in various sorts of career 
choice. The very patterning of the failures 
of the test therefore confirms its validity as 
a measure of interest! 


Private and Public School Results 


Suppose we explore the consequences of 
postulating that the Strong does measure in- 
terests. We infer that the test will predict 
future job-choices only for those men who 
(consciously or unconsciously) give weight to 
their own interests when they choose a career. 
For men who do not follow their interest, the 
test will not predict. We therefore expect 
the Strong’s “validity” to vary between 
groups known to take their own interests 
more or less seriously. A major instance of 
such a prediction is provided by our tests 


from men who prepared for Harvard at public 
and private secondary schools. 


The public school boy has usually been 
raised in the “American success culture,” de- 
scribed by many anthropologists (1, 2, 4, 10 
15). His parents’ efforts focussed on pre- 
paring the boy for future vocational achieve- 
ment. Job choice has been for him a vital 
matter; his future self-estimate will hinge on 
his job-title and on how well he does within 
his occupational field. As one Study par- 
ticipant explained it, “I have satisfied myself 
as to my ability to compete successfully with 
most of my contemporaries,” 

The private school boy will often have been 
reared in a variant orientation, ably described 
by Florence Kluckhohn (10), where child- 
rearing was intended to perpetuate in him a 
“preferred personality.” Occupational role 
will have been subordinated to family social 

patterns. In our 1953 questionnaires eleven 
private school boys but only three public 
school boys put family interest or personal 
breadth ahead of achievement values when 
discussing their “personal future.” As Kluck- 
hohn so nicely phrased it, the contrast is be- 


Charles McArthur 


tween two subcultures, one emphasizing a 
“Doing,” the other a “Being,” orientation. 
One consequence of this subcultural con- 
trast is a difference in the importance assigned 
to interests when men make their vocational 
choice. In the “success culture” a son is ex- 
pected to surpass (therefore often bypass) his 
father’s occupation. Choosing a job is for 
him a vital matter, the more so because e 
choice is so greatly “up to him.” So m 
hinges on his making a “right” choice, A 
culated to yield maximal success, that he W! 
often consult his own interest pattern, either 
introspectively or with formal aid from 3 
vocational counselor. By contrast, the pures 
case of the upper class variant is a man whose 
permitted choices are limited to three: trustees 
lawyer or doctor. Patricia Smith (20) aa 
scribed the sanctions that suppress other ie 
ternatives. (The Study has witnessed ie 
matic conflicts within upper class men ae 
Personal “calls” gave way before the press 
of tradition.) While the average priva ; 
school boy is not subjected to so foca 3 
Pressure, he will nevertheless possess ue 
reinforcing the tangible demand that he J 
his father or uncle in The Business and t 
intangible expectation that he will first of rs 
be the Right Sort. As one participant er 
As near as I can tell I have those (person k 
qualities in some smal] measure, so I thin 


k : y 
it foolish to spend time thinking about ™ 
future.” 


1 


If all this js true, we arrive at the predic- 
tion that interests will matter less and thet? 
fore the Strong will be less valid when appli 
to the behavior of private school boys. Ta? z 
7 shows this to þe the case. Chi square ans 
gests p less than 05; if we combine © 


Table 7 


Validity of Strong Test Applied to Public wae 
Private School Boys 


Validity Public Private : 
Good Hit 19 8 a 
Poor Hit 4 8 21 
Clean Miss 8 13 A 

= — 0 
Total 31 29 9 


Í 


(avoiding the low cell and isolating the rela- 
Hon between public school attendance and 

Good Hits”), we can apply Fisher’s formula 
and arrive at p below .01. Our proposition 
Seems well validated. 

_If we translate Table 7 into percentage, we 
discover that three-quarters of the public 
school tests gave some sort of “hit” on the 
Pen eation engaged in fourteen years after 
ea That is exactly the figure reported 

Y Strong (23) for his twenty-year follow-up. 
ie the other hand, we try to apply the 
will = private school boys, our predictions 

i e useless almost half the time. 

cag? ting out the public school cases, we 
Gon ny revalidating Strong’s four proposi- 
ka d Proposition 1 fares better: men en- 

a, in occupation A still do not have “a 
ae er Interest score in A than in any other 
pe but the median rank of the per- 
fifth scale is third, where formerly it was 
dikin, That is more consistent with Strong’s 
' tinued quoted earlier, that the occupation con- 
f thira in will have ranked first, second, or 
publi Proposition 2 is no better for the 
Some school group alone; that is because 

ig Occupations (engineer, chemist) attract 

Ae Scores: from public school, while others 
riv. y er, minister) attract higher scores from 
Täs: ; School, At any rate, Proposition 2 

ey ready verified sufficiently. Proposition 
and S already verified in every comparison, 
Scale 7 cannot be improved. There is one 
sitin T lic Administrator) on which Propo- 
ut a is false for the private school group 
Sition ry for the public school group. Propo- 

is about equally valid in both groups. 


p: 


Discussion 


5 'S finding will raise various questions, 

of which can be answered from our data. 
e a one, the “private school effect” 
true th € explained in terms of income. It is 
| Dlieg ra the Strong is less accurate when ap- 
Sang -2 families receiving over sixteen thou- 
S Ollars a year, but this figure marks only 
While Pper quartile of our income statistics, 
all ne € “private school effect” is visible at 
Co levels. For example, in the second 
ong, MWartile, with income held reasonably 
» between four and six thousand dol- 


pa 
we 


Long-Term Validity of Strong Interest Test 


351 


lars, public school tests score good hits 75% 
of the time, private school tests only 40%. 
In all income quartiles that are adequately 
represented by public school cases, the pro- 
portion of misleading tests remains about 1 
in 4; in all income quartiles that are ade- 
quately represented by private school cases, 
the proportion of misleading tests remains 
about 1 in 2. 

These figures suggest that it is the fact of 
having attended private school (or of being 
reared in a subculture from which one is sent 
to a private school), rather than income, and 
somewhat independently of social class, that 
depressed the validity of the test. Several 
explanations suggest themselves. The most 
obvious would be that the Strong was vali- 
dated against public school graduates. (Re- 
gional differences in patterns of secondary 
education would have led to this circum- 
stance.) Next most obvious might be that 
attending private school is one of those “ex- 
periences affecting interests” that Super (24) 
warns us have been too little studied. 


Related Findings 


The effects of private school mores on per- 
sonality reported here are not isolated phe- 
nomena. Private school boys have previously 
been assumed to possess a special:system of 
values, by scientists (10, 20, 25, 26), novel- 
ists (11, 12, 17) and deans (27). Empirical 
demonstrations show that their responses dif- 
fer from those of public school boys on 
projective tests (13), especially with regard 
to the projection of need Achievement (14), 
the need that underlies the results reported 
here. What is said here of their attitudes to 
vocational success has long been known with 
regard to their attitude toward academic suc- 
cess (18) and the effect of this attitude on 
their grades has long been empirically dem- 
onstrated (5, 6, 19). Very much that is 
known about this topic remains unpublished. 

It seems to the writer that psychologists in 
Eastern universities, by failing to report the 
public-private school differences in their data, 
are failing to record a fine “natural experi- 
ment” in the laws governing culture and per- 
sonality. The New England private school 
boy is often that rarest of subjects in the 


352 


psychological laboratory: a member of one 
of America’s geographically scattered upper 
classes (3). The Chicago group (1, 2, 7, 9, 
25, 26) has done much to call our attention 
to differences between middle and lower class 
personalities. Are not differences between 
subcultural personalities in the middle and 
upper classes likely to be just as great? 


Summary and Conclusions 


A fourteen-year follow-up was made of 
Strong Vocational Interest Blanks adminis- 
tered in 1939 to participants in the Study of 
Adult Development. The validity of the test 
as a predictor of occupational choice at first 
appeared to be slightly lower than that Te- 
ported by Strong. Of Strong’s four valida- 
tion propositions, two were confirmed, one 
(that lawyers outscore non-lawyers on the 
Law scale, etc.) strikingly, the other (that 
lawyers obtain one of their best scores on the 
Law scale, etc.) less so, The median test 
offered four “extraneous” Predictions, 

It was possible to demonstrate a relation 
between conformity to choices commended by 
the test and future vocational happiness, 
Choosing a job for which one had (some years 
before) scored “A” also seemed to reduce the 
likelihood of developing fatigue, irritability or 
other symptoms of strain. 

The proposition was offered that SVIB 
validly measured interests but that failure 


upported this idea a5 
n 1n occupations which 


rately and which it did 
not. 


As a corollary of this proposition and on 
the basis of what has been learned elsewhere, 
it was predicted that the Strong would be 
applicable to boys who attended a public 
secondary school but less useful for boys 
who had prepared in a private preparatory 
institution. That was the case, The pre- 
dictive validity of the test among the public 
school group was almost exactly that origi- 
nally reported by Strong. Among private 
school boys, the test was, half of the time, 


“Charles M cArthur 


inapplicable. Further, Strong’s first valida- 
tion proposition was improved in the public 
school group, the median test record offering 
only two extraneous predictions. z 

The import of this finding may be read in 
one of two ways. If we assume the an- 
thropological theories about the American 
middle and upper classes to be true, then this 
is a demonstration that “invalidity” in the 
Strong arises because interests do not de- 
termine choice rather than from failure of 
the test to measure interests. On the other 
hand, the implication that there may be a 
distinct psychology of the upper class is also 
pointed out. 

From all this may be drawn the follow- 
ing conclusions: 

1. The Strong has at least the validity 
claimed for it as a measure of interests. 

2. Its most rigorous validation criterion 
will be the prediction of actual behavior, but 
even that criterion is met at least 1 time in 2, 

3. We may regard as critical for under- 


standing the use of the test Strong’s (23) 


proposed “future calculations as to how much 
other factors, such as economic conditions, 
family pressures, etc, affect a man’s diopi 
tional career” Tq this respect attention 
should be called to upper class variants ° 
the American personality, 

Further study of: (a) the effects of €- 
vironmental press in conflict with interests 
measured by the Strong; and (b) the differ- 
ences between public and private school pe” 


sonalities will be made from Study of Adult 
Development data. 


Received September 21, 1953, 


References 

. n- 
i; Davis, A. Social class influences upon re 

AA Cambridge: Harvard University 

948. 

A so- 
2. Davis, AL American status systems and the be 

Cialization of the child. In Kluckhohn, c A, 

Murray, H. A, (Eds.), Personality in "4 


society and culture. New York: Alfred 
Knopf, 1949, 
3. Goldschmidt, W. 
critical review, 
52, 483-498, 


4. Gorer, G The Amer New york 
à erica; le. 
Norton, 1948, ia 


` 5 jca: 
Social class in Amer 


s 1950; 
Amer. Anthropologist» 


l 


l 


u, 
12, 


13, 


14, 


15, 


16, 


17, 


' Hollingshead, A. B. Elmtown’s youth. 
10, 


Long-Term Validity of Strong Interest Test 


Harris, D. The relation to college grades of 


some factors other than intelligence. Arch. 


Psychol., New York, 1931, 20, no. 131. 


. Harris, D. Factors affecting college grades; a 


review of the literature, 1930-1937. Psychol. 
Bull., 1940, 37, 125-151. 


+ Havighurst, R. J. and Taba, H. Adolescent char- 


acter and development. New York: Wiley, 


1949, 


"Heath, C. W, et al. What people are. Cam- 


bridge: Harvard University Press, 1945. 

New 
York: John Wiley & Sons, 1949. 

Kluckhohn, Florence R. Dominant and substi- 
tutive profiles of cultural orientations: their 
Significance for the analysis of social stratifi- 

M Cation. Social Forces, 1950, 28, 376-393. 
arquand, J. P, The late George Apley. Bos- 
ton: Little, Brown & Co., 1937. 
arquand, J. P. Point of no return. Boston: 

M Little, Brown & Co., 1949. , 
cArthur, C, C. Cultural values as determi- 
anis of imaginal productions. Unpublished 
doctor’s dissertation, Harvard University, 1951. 
cArthur, C.C, ‘The projection of need Achieve- 
Ment: a re-examination. J. abnorm. soc. Psy- 
chol., 1953, 48, 532-536. 
tad, Margaret. Has the ‘middle class’ a fu- 
ture? Survey Graphic, 1942, 31, 64-67, 95. 


urray, H. A, and Morgan, Christiana. A clini- 
ce] Psychol. 


f. study of sentiments. Genet. 
mees" 1945, 32, 3-149, 153-311. , 
ips, J. The second happiest day. New 


ork: Harper & Bros., 1953. 


18. 


19. 


. Smith, Patricia. 


. Strong, E. K., Jr. 


. Strong, E. K., Jr. 


353 


and Mandler, G. Some cor- 
J. abnorm. soc. Psy- 


Sarason, S. B. 
relates of test anxiety. 
chol., 1952, 47, 810-817. 

Seltzer, C. C. Academic success in college of 
public and private school students: freshman 
year at Harvard. J. Psychol., 1948, 25, 419- 
431. 

The problems of occupational 

adjustment for the upper class Boston man. 

Unpublished honors thesis, Radcliffe College, 

1950. 

Vocational interests of men 

and women. Stanford University: Stanford 

University Press, 1943. 

Interest scores while in col- 

lege of occupations engaged in twenty years 

later. Educ. psychol. Measmt., 1951, 11, 335- 

348. 


. Strong, E. K., Jr. Twenty year follow-up of 


medical interests. In Thurstone, L. L. (Ed.), 
Applications oj psychology. New York: Harper 
& Bros., 1952. 


. Super, D. E. Appraising vocational fitness. New 


York: Harper & Bros., 1949. 


. Warner, W. L. Social life of a modern com- 


munity. New Haven: Yale University Press, 


1941. 


Warner, W. L., Havighurst, R. T., and Loeb, 


M. L. Who shall be educated? New York: 
Harper & Bros., 1944. 


“The College.” In Reports of the president and 


treasurer of Harvard College, 1923-4. Ofi- 
cial register of Harvard University, vol. 22, 


no. 5, February 24, 1925. 


Tue JOURNAL oF APPLIED PsycHoLocy 
Vol. 38, No. 5, 1954 


Vocational Interests of Naval Aviation Cadets * 


Nathan Rosenberg and Carroll E. Izard 


The Tulane University 


In interviews with cadets who voluntarily 
withdraw from the Naval Air Training Pro- 
gram, an active dislike of flying was one of 
the most important expressed reasons for 
withdrawal (1). An attempt was made, 
therefore, to investigate the importance of in- 
terests as a correlate of success in Naval 
Aviation. This attempt was directed toward 
an examination of broad interest patterns of 
cadets through measurement of their voca- 
tional interests rather than dealing with 

- specific interests in flying and the training 
program itself. Since questions about flying 
and the program are avoided in tests of voca- 
tional interests, such measures were consid- 
ered more subtle, less subject to momentary 
fluctuations in attitudes that seem present in 
newly arrived cadets, and of greater psycho- 
logical importance. 

The Kuder Preference Record, Vocational, 
Form B, (3) was chosen to measure voca- 
tional interests since it is one of the interest 
questionnaires which has been most widely 
studied for validity. Form B was selected 
because it had been administered in World 
War II to a Population of Air Force cadets, 
The writers feel that Navy and Air Force 


, and the proposed comparison 
will present definitive evidence with regard to 
measured vocational interests. Generaliza- 
tions from World War II Air Force data are 
often made concerning the importance of 
many psychological characteristics for selec- 
tion of pilots. If this Air Force population 
differs in important Tespects from other avia- 
tion populations, such generalizations should 
be tempered. 

At the outset certain methodological con- 
siderations should be noted. It is reasonable 


*This article was presented as a Teport to the 
U. S. Naval School of Aviation Medicine, Pensacola, 
Florida, under ONR Project NR154-098. Opinions 
or conclusions contained in this report are those of 
the authors. They are not to be construed as neces- 
sarily reflecting the views or Possessing the endorse- 
ment of the Navy Department. 


to assume that certain vocational interest pat: 
terns may cause cadets to enter Naval Air 
Training. Once this pre-selection has oper- 
ated, there may or may not be a relationship 
between interests and successful completion © 
training. That is, interests may cause i 
into training but they may or may not 
Predictive of success after pre-selection Be 
occurred. Thus, it is important to conside 
whether naval aviators possess distinguishing 
interests prior to entry into training. 

Should selective drop-out during taining 
occur, it is possible that interests operate o 
a post-selective device. This implies a al 
relation between interests and successful ee 
Pletion of training. This correlation is a 
adequately tested by a longitudinal ee 
in which entering cadets are tested and t n 
followed through the program to identity 
successful and non-successful cases. A ee 
Promise to this longitudinal study is affor nE 
by the cross-sectional approach. Eater 
cadets, non-successful cadets, and success 5 
cadets are tested and their mean m 
Scores compared. Mean scores which a 
systematically are interpreted as evidence fü 
a correlation between interests and success 
completion of Naval Air Training. dur- 

Should training, or factors operating © ig 
ing training, change the interests of eae 
cadets, the change might contaminate Je, 
ferences regarding test validity. For examp 
maturation of cadet interests over a" nce 
month training period might well ee 
apparent test validity. In this report, 5° 
tive drop-out during training is assume ma 
result from differences in interests sep? 
an attrition and Successful group. The Fat 
ceding considerations have been made 50 th? 
2PPropriate safeguards will be followe 
interpreting the results. 


The following questions are considere 
this report: 


d in 


n erin8 
1. Do the vocational interests of ent’ } 


SAA nificant 
Naval Aviation Cadets differ significa 


354 


————i 


Vocational Interests of Naval Aviation Cadets 35 


nn 


Table 1 


Means and Standard Deviations for Kuder Interest Scores on Various Groups Considered 


Entering Naval (DOR) Voluntary World War II Kuder’s 
Aviation “Successful” withdrawals Air Force Normative 
i Cadets Cadets from Training Cadets Population 

Kui: Micet N=651 N=137 N=137 N=937 N=2667 
Arca M SD M SD M sD M sD M SD 
heia 81.5 17.8 $5.8 16.3 73.3 20.9 86.0 15.6 78.6 22.8 
alas 33.9 11.1 33.2 11.3 33.3 12.6 33.2 9.3 353 10.6 
eee ific 70.7 14.8 68.4 14.5 61.1 16.4 67.6 12.6 64.0 15.5 
poe 71.7 18.8 73.9 199 82.3 20.4 68.4 16.8 74.4 20.6 
Liter a 50.9 13.4 53.1 14.4 50.4 16.0 49.3 13.3 46.1 13.6 
TE 38.6 13.6 35:5 Hi 41.1 14.4 46.4 13.3 47.8 15.1 
eme mr 19.6 9.5 17.2 8.6 19:7 95 19.0 9.0 16.6 9.6 
e Serole 66.4 17.0 65.7 16.3 69.3 16.5 63.7 143 73.7 17.5 
A 41.0 11.6 42.7 12.7 4.9 141 46.4 121 52.1 13.5 


ae an unselected vocational group, namely 

the eee group found in the test manual for 
uder Preference Record? 

tee the vocational interests of success- 

is aval Aviation Cadets differ significantly 

wea attrition population of cadets who 
raw at their own request? 

E Do the vocational interests of present- 

ay Naval Aviation Cadets differ significantly 


Ti . 7 z 
om a wartime Air Force population of 
Cadets? 


Procedure 


gambles Used, 1. Entering classes 3-53 through 
tested and classes 16-53 through 23-53 were 
Subjeci A total of 16 classes consisting of 651 
com. fe were included in this group. By the 
cent etion of Naval Air Training, about 15 per 
valne an entering class will have withdrawn 
eral arily. Attrition from all other causes gen- 
nee averages to about this same percentage; 

total attrition averages about 30 per cent. 

Who he successful group consisted of 137 cadets 
ine Were tested at Corry Field, approximately 
Months after entry into training. Based 


Upo, ; ; y 
ier Previous experience, it is estimated that 


3. 90 per cent of these subjects will graduate. 
ht total of 137 DOR cases (Dropped at 
trati Uest) were tested, as many as adminis- 
L Jan D possible during the period from about 
qnuary through 1 June 1953. 
Men e e norm group consisted of 2,667 adult 
ftom gaged in diversified occupations, obtained 
Recongt® manual for the Kuder Preference 

5, Be 
avai tom published Air Force data, results were 
le for 937 wartime cadets, 721 of whom 


graduated primary training and 216 of whom 
were eliminated (2). 


Results 


Table 1 presents a summary of means and 
standard deviations for the nine interest areas 
measured on the groups considered. Table 2 
shows critical ratios testing the significance 
of the differences in mean interest scores for 
the groups compared. 

Comparison of Entering Cadets’ Interests 
to Kuder’s Norm Group. The norm group 
consists of “2,667 adult men engaged in oc- 
cupations, with each major occupational 
group weighted in proportion to its occur- 
rence in the general population (with the 
exception of unskilled and semi-skilled work- 
ers)” (3). 

On the average (Tables 1 and 2), entering 
cadets possess significantly different interests 
from those found for Kuder’s norm group in 
all nine interest areas measured, Entering 
cadets are relatively more interested in scien- 
tific, artistic, musical, and mechanical ac- 
tivities and relatively less interested in cleri- 
cal, literary, social service, persuasive, and 
computational activities. 

Another method of evaluating the differ- 
ence in interests between the two groups is 
gauged by the following procedure. The 
mean interest scores for entering cadets and 
the norm group are placed on the distribution 


of scores for the norm and percentile ranks 


356 


Nathan Rosenberg and Carroll E. Izard 


Table 2 


Critical Ratios Testing Significance of Difference in Mean Kuder Interest Scores ¢ 


Mec Com Sci Per Art Lit Mus SS Cle 
A. Entering Cadets versus Norm Grou p 91.38" 
CR. 3.42** = 2.81** 10.22** = 3.20**  8.08** 14.97** zta 9.76** 21. 
B. Successful Cadets versus Vi oluntary Withdrawals (DOR) 133 
CR. 5.52** 0.06 3.92% 3.46** 143 3.62** = -2.32* 1.82 i 
C. Entering Cadets versus World War II Air Force Cadets 02** 
CR. Saye 138 432 3.61  2,37*  11.25* 134 3.28* 9 


* Significant at the 5% level of confidence., 
** Significant at the 1% level of confidence, 


f Mec—Mechanical; Com—Computational; Sci—Scientific; Per—P 


Mus—Musical; SS—Social Service; Cle—Clerical, 


obtained. Percentile ranks obtained by this 
procedure are presented in Table 3. 

In a perfectly normal distribution, the 
mean interest scores for the norm group 
would all lie at percentile rank of 50, the 
median score. Deviations from a percentile 
rank of 50 for the norm group suggest the 
direction and degree of skewness for the norm 
distribution. Since all percentile ranks for 
the norm group appear fairly close to 50, 
the skewness, if significant, would not ap- 
pear pronounced. Inspection of the norm 
distribution for mechanical interest, where 
the mean score approximates a percentile 
rank of 45, Suggests that mechanical inter- 
est scores are slightly skewed toward the high 
end of the distribution, This explains an ap- 


Table 3 


Percentile Ranks of Interests for En 


tering 
Naval Cadets and Norm Group 
Difference 
. Between 
Entering ntering 
Interest Naval Norm and Norm 
Area Cadets Group Group 
Mechanical 50 45 +5 
Computational 45 49 —4 
Scientific 67 50 +17 
Persuasive 48 52 -4 
Artistic 65 54 +11 
Literary 30 54 —24 
Musical 65 57 +8 
Social Service 30 50 —20 
Clerical 20 52 —32 


ar it—Literary 
ersuasive; Art—Artistic; Lit—Literarys 


parent contradiction whereby entering wins 
show a mean mechanical interest score por 
alent to a percentile rank of 50 on the nk > 
distribution and, at the same time, show a 
nificantly greater interest in the mechani 
area than the norm. en 
The extremity of the differences meae 
entering cadets and the norm group is ial 
phasized for the clerical, literary, and ye 
service areas. Entering cadets seem a 
selected with respect to a relative dislike 7) 
activities of reading or writing eT 
routine filing or secretarial work (clerica 
and activities which contribute to the es 
of people (social service). To a lesser os 
tent, they are pre-selected with respect ee 
relative liking for activities of the scientific» 
artistic, and musical interest areas. jder 
Since the norm group is presumably © e 
than the cadet group, it may not be conclu 
that these differences are all characteris' e 
Naval Cadets as a vocational group. La 
of the differences could reflect changes i ! 
terest characteristic of an older age te 
Furthermore, cadets undoubtedly represen je 
Population with more education than do in 
norm group. Thus some of the differences a 
interests could be a reflection of education. 
level which distinguishes the two gen 
aside from Vocational selection. When P 
factors are better controlled, it will be Lae 
ble to isolate which of the interest areas na 
flect those characteristics of a vocati? 


a 
group and not those for age or educatio 
groupings. 


j- 


Vocational Interests of Naval Aviation Cadets 357 


It would seem reasonable that a preference 
for the scientific area would be the one area 
Most likely to be truly characteristic of Naval 
Aviators as opposed to vocationally unse- 
lected groups. 

Comparison of Successful Cadets to Volun- 
tary Withdrawals (DOR). Differences in in- 
terests between the above two groups sug- 
gest the possible usefulness of the Kuder 
Preference Record as a predictor of DOR 
attrition. As can be noted from Tables 1 and 
2, successful cadets are significantly more in- 
terested in mechanical and scientific activi- 
ties than DOR’s. They are significantly less 
interested in persuasive, literary, and musical 
Interests than DOR’s. 

From these results, the interest picture for 
the successful cadet is an individual who has 
a Positive attraction toward activities which 
involve the use of tools and machinery; he 
also likes abstract and theoretical activities of 
a Scientific nature. The DOR appears to be 
an individual who is more interested in ac- 
tivities which involve convincing people (per- 
rae reading or writing, and apprecia- 
îs a or participation in musical activities; he 

ess attracted by mechanical and scientific 
activities, 
on this connection, it should be recalled 

at entering Naval Aviation Cadets are se- 
wee with respect to mechanical aptitude 
ce cadets with very low Mechanical Com- 
tehension Test scores are not admitted to 
the training program. These data indicate 
bd Mechanical interest, aside from mechani- 
aptitude, is important for successful com- 
ction of the Naval Air Training Program. 

. Uther study will be made to evaluate the 
aproved prediction of DOR attrition when 
Ptitudes and interests are both considered. 
wit omparison of Entering Cadets’ Interests 
(Take Air Force Population. Critical ratios 
in ; le 2) reveal some important differences 
terest between the above two groups. 
a ig cadets’ interests differ significantly 
in a Air Force entering cadet population 
tional areas with the exception of computa- 
Ence and musical activities. The differ- 
la S between the two groups are particu- 
zA Pronounced for the literary, clerical and 

anical areas. 


Inspection of the mean scores for the Air 
Force eliminees from training reveals that 
differences in interests between the two at- 
trition groups are considerable.* The Naval 
Cadet who withdraws voluntarily shows es- 
sentially a different interest pattern from the 
Air Force cadet who was eliminated from 
training during World War II. The reasons 
for this difference are not very clear, aside 
from motivation present during World War 
II which is not so pronounced today. How- 
ever, the important fact is that these two 
populations are different—at least with re- 
spect to interests. Thus if a test did not 
show validity on the Air Force population of 
World War II, this does not necessarily pre- 
clude its being valid for present day Naval 
Aviators. The attempt to use the Kuder in 
this study was undertaken despite Air Force 
data which showed it to be invalid for pre- 
dicting pass-fail during World War II (2). 


Discussion 


It will be recalled that successful cadets 
as compared with DOR groups possess higher 
mean interest scores for mechanical and scien- 
tific areas and lower for the persuasive, liter- 
ary, and musical areas. The mean interest 
scores for entering Naval Aviation Cadets lie 
between those found for successful and DOR 
groups for mechanical, literary, and musical 
interest areas (Table 1). These findings for 
the entering group are consistent with the 
assumption that selective drop-out from train- 
ing caused the significant differences noted 
between successful and DOR groups. How- 
ever, the scientific interest area deserves spe- 
cial comment since the mean scientific inter- 
est score for the successful group is 63.4, for 
the DOR group 61.1, but for entering cadets 
70.7. Although entering cadets are more like 
the successful than the DOR, for a definite 
trend to be present, the mean interest scores 


1 Mean interest scores for Air Force eliminees from 
training differ by only a small fraction of a point 
from those for the Air Force graduates, with the 
exception of artistic and social service interests. 
Eliminces are about 2.0 and 2.5 points higher and 
lower in these two interest areas respectively. Thus, 
interest comparisons may be made directly to the 
total Air Force population means with little loss of 
accuracy as compared to the eliminees from this 


population. 


358 


for entering cadets should lie between the 
means for successful and DOR groups. The 
same reasoning applies for persuasive interest 
where the mean score for the entering group 
does not lie between the DOR and successful 
groups. 

It is possible that entering cadets tend to 
over-rate their interest in scientific activities. 
Having just reported to Naval Air Training, 
it is conceivable that they would tend to rate 
themselves higher in scientific interest merely 
because they feel they should be high in this 
interest. 

Since further “cross-validation” will be ap- 
plied to these data in any case, an empirical 
check will be made for those interest areas 
apparently important for successful comple- 
tion of Naval Air Training. Based on the 
differences between successful and attrition 
cases, weights will be given to the interests 
that distinguish the two groups. From these 
weights, predictions of pass or DOR attrition 
will be made for entering cadets. In time, 
cadets who actually voluntarily withdraw and 
those who succeed will be determined. These 
results will be checked against the predictions 
made, and the actual utility of the Kuder 
Preference Record for predicting DOR cases 
will be ascertained. From the results pre- 
sented in this report, it seems very likely that 
the measured vocational interests of entering 
cadets will predict DOR attrition significantly 
greater than chance expectation. 


Summary 


The vocational interests of cadets would 
seem important for successful completion of 
Naval Air Training. Therefore, the Kuder 
Preference Record, a measure of relative pref- 
erence for nine broad vocational interest areas, 
was administered to 651 entering Naval Avia- 
tion Cadets; 137 DOR attrition cases (volun- 
tary withdrawals from training) and 137 
“successful” cadets. The successful cadets 
were tested near completion of their basic 
training; from previous experience it is esti- 
mated that over 90 per cent of these cadets 
will graduate. 


Results indicate: 


Nathan Rosenberg and Carroll E. Izard 


1. Entering cadets show significantly E 
interest in scientific, artistic, musical, an 
mechanical activities than a vocationally a 
selected population. They are less w 
ested in clerical, literary, social service, pe 
suasive, and computational activities. a 

2. Successful cadets are relatively mor 
interested in mechanical and scientific oe 
ties as compared to a group who with E: 
from training at their own request. They i 
less interested in persuasive, literary, ihe 
musical activities than the voluntary W! 
drawal cases. 

3. The voluntary withdrawal group E 
an essentially different interest pattern we: 
the group eliminated from training in the 
Force during World War II. 

It is concluded: 


1. Entering cadets have interest et 
which are different from those found p 
vocationally unselected group. Some of t 6 
distinguishing interests may arise hee i 
cadets’ age or educational level rather t 
choice of Naval Aviation as a vocation. elf- 
factor of selection screening, as well pth ve 
selection on the basis of interests, may P4 si 
partially determined these interest ee 

2. The Kuder Preference Record a ee 
promise of validity for predicting DOR ua- 
trition. The mechanical, scientific, pes ut 
Sive, literary, and musical interest keys 
pear the most important for this aa iled 

3. Some psychological tests which fa Air 
to predict attrition for World War I 


en 
Force cadets may show validity for pres 
day Naval Aviators. 


Received October 23, 1953. 


References 


ns 
1. Bair, J. T. and Ambler, R. K. Expressed i 
and background characteristics for Naval fing 
tion Cadets withdrawing voluntarily Avia- 
January 1953. U, S. Naval School of Re- 

tion Medicine, Special Report—Attritio® 
Port No, 6, February 1953. 


2. Guilford, J. P. and Lacey, J, I (Eds.) 
Classification Tests. 


Air Force Aviation 
search Reports, 1947, 


printed 

rm, 
Report Number ef Ree 
Psychology Prost? 7 
3 gude 
3. Kuder, G Revised Manual for the Barch 

Preference Record. Chicago: Science 

Associates, 1946, 


THE JOURNAL or A jk PER ` 
Vol. 38, No. 5. Hi IRD PsycHo.ocy 


Coding the Kuder Preference Record—Vocational * 


Robert Callis, William C. Engram, and John F. McGowan 


University Counseling Bureau, University of Missouri 


oe the use of vocational interest 
ian Ps vocational planning is the assump- 
le: at if a person’s interests are similar to 
e interest of people in occupational groups 
m have experienced a high degree of satis- 
Sie. in their work, he will derive most 
“ sfaction doing the same or similar kind 
of work. That is, if a person’s interests in 
te mmon-everyday things” are most similar 
fs S engineers, there is a high probability 
Wii e will derive more satisfaction from 
Geers as an engineer or some closely related 
ton pation than he would from other occupa- 
to S There has been considerable research 
t s abstantiate this proposition. In order, 
inter, for a counselor to be effective in the 
a ation of his client’s interests as meas- 
aid hi y tests, he needs some sort of guide to 
ie 3 Im in giving the client a comparison of 
interests with that of various occupational 
8toups. 
ge paper presents a guide for the coun- 
Profi] o use in interpreting the individual 
baa of the Kuder Preference Record— 
AA ional (Kuder PR-V). In order to facili- 
a a meaningful interpretation of the test 
tous to the client, it is often better for the 
cy hselor to speak in terms of several oc- 
al fields in addition to descriptive 
sive). (such as mechanical, artistic, persua- 
ot , the meaning of which is often vague 
he client. There is, therefore, a need for 
a uping of occupations based on real test 
in + Which the counselor may feel confident 
Using, 
under (12) grouped specific occupations 
ig the various scale headings of his in- 
“ieee However, many counselors have 
in reluctant to interpret the client's profile 
b "Ota of Kuder’s groupings of occupations 
Use many of the groupings were not sup- 
*A revisi ; : . A Jin, 
Te en Roun Na oa timeoeari) 
+ of th his report which included Tables 1 through 
at a cosp Present paper is available from the authors 
of 50 cents per copy- 


ported by empirical data. During the past 
few years Kuder and many others have re- 
ported a considerable amount of empirical 
data about various occupational groups which 
can be used to group occupations according 
to interest test profiles. However, some of 
the discrepancies between Kuder’s grouping 
(12, Table 1) and his empirical data (12, 
Tables 2 and 3) are rather striking. Wiener 
(20) cited as an example of one of these dis- 
crepancies Kuder’s “39” listing (Scientific 
and Clerical interests) as including the oc- 
cupation of pharmacist. Looking at actual 
test results for a group of “pharmacists and 
drug store managers,” however, one sees a 
significant elevation on scale 3 (Scientific) 
and only an average score on scale 9 (Cleri- 
cal). 

We find, as another example, that Kuder 
lists “Author; editor; reporter” under the 
categories of “4” (Persuasive), “6” (Liter- 
ary), “36” (Scientific-Literary), “46” (Per- 
suasive-Literary), “67” (Literary-Musical) 
and “68” (Literary-Social Service). From 
empirical research, Mathewson and Herbert 
(14) found that only the “67” category was 
the pattern for their group of 113 author- 
journalists. 

Also, one is not justified in saying, as Kuder 
(12) implies, that a person should score high 
on the mechanical scale in order seriously to 
consider engineering as a career. Chemical, 
civil, electrical, and sales engineers as groups 
do not have mean scores on the mechanical 
scale above the 65th percentile rank. Me- 
chanical engineers and some industrial engi- 
neers did score significantly high‘on the aver- 
age on the mechanical scale. The mean score 
of all professional engineers on this scale is 
below 65 P.R. (12, Table 2). Actually the 
interest typical of the large majority of engi- 
neers is characterized by significant elevations 
on scales 2 and 3 (Computational and Sci- 


entific). 
Other discrepancies are apparent after com- 


359 


360 


paring Kuder’s groupings with empirical re- 
sults. Thus, many of Kuder’s original “logi- 
cal” groupings now can be replaced by the 
increased body of empirical data. As a re- 
sult a counselor can operate more effectively 
when he can base his test interpretation on 
real data. 

In order to make this information based on 
actual test data usefully available to the 
counselor it is necessary to have it organized 
into some system. Wiener (20) proposed a 
coding system which coded each individual 
score over the 75th percentile rank. How- 
ever, Frandsen (8) criticized Wiener’s system 
as not being comprehensive enough; that is, 
the 75th percentile rank was too rigorous a 
cutting point. Instead of using only the 
scores above the 75th percentile rank, Frand- 
sen suggested a coding system that would in- 
clude deviations outside the 65th to 35th 


percentile rank range, and thus gain much 
better differentiation among the various oc- 


cupations. By such a system, there would 
be less frequency of finding that the code 
for an individual’s profile matches identically 
the codes of many different occupational 
groups. Also, a mean score which falls out- 
side the 65th to 35th percentile range would 
be a significant deviation for almost any rea- 
sonably sized group. 

Diamond (5) has shown how the use of a 
uniform cutting score for the Kuder scales is 
misleading and not in keeping with the reality 


of the occupational world as revealed in the 
census data. He notes that “ 
of employed urban men, bu 
40 per cent, are eng 
mechanical nature. 


not 25 per cent 
t approximately 
aged in occupations of a 
It is, therefore, a sta- 
tistical absurdity to expect that all men who 
enter the mechanical field shall have mechani- 
cal interest above the 75th percentile rank,” 
On the other hand, there are some interest 
fields which employ only a fraction of one per 
cent of the labor force. Music is such a field. 
In this connection, Diamond points out that 
a musician who scores at the 75th percentile 
rank on the Kuder PR-V musical scale is 
more than two standard deviations below the 
mean of his occupational group. 


R. Callis, W. C. Engram, and J. F. McGowan 


So far, relatively little attention has i 
given to significantly low scores on interes 
test scales. The low scores may be equally 
useful in characterizing the interest of an 
occupational group as are high scores. For 
example, most engineers score significantly 
low on the social service scale. 

It appears, therefore, that a system for 
coding Kuder PR-V profiles which would re- 
flect the low as well as the high scores, woul 
be quite helpful in studying the kind of ri 
terest which is typical of various opengaan 
groups. Such a system should provide a CO : 
for a profile which is short and simple bu 
which preserves a maximum amount of in- 
formation. ich 

A coding system is proposed here whic! 
meets Frandsen’s (8) objections to Wiener S 
(20) system. It is similar to the system Te 


ported by Hathaway (10) and Holland ¢t al. 
(11). 


Coding Procedure 


To code a profile, follow these steps. AS 5 
example we will use the percentile ranks COY- 
responding to mean raw scores on the TA 
ous scales of the Kuder PR-V made by 3 
group of surgeons (12), The first num “ 
denotes the scale and the number after th 
dash denotes the percentile rank: 


Out Mec Com Sci Per 
0-68, 1-45, 2-25, 3-75, 4-27; 
Art Lit Mus Soc Cle 
5-61, 6-66, 7-62, 8-48, 9-25 
nd 
Step 1. Select all scores of 75 P.R. 4 


above and list the scale number in descendir 
order of magnitude of the percentile rant 
Then place an apostrophe after these. 
ample: 3’, 
Step 2. Select all scores between 74 Be 
and 65 P.R. inclusive and list the scale nu? 
bers in descending order of magnitude 0 a 
Percentile ranks next after the apostroP pa 


one Place a dash after these. Example’ 
'06—. 


Step 3. 


e 
Select all scores 25 P.R. and f 
low and lis 


ing 
t the scale numbers in ascend ; 
order of magnitude of the percentile TO 


x 
Then place an apostrophe after these- 4 
ample: 3°06—29’, 


M 
. — 


| 


Coding the Kuder Preference Record—Vocational 361 


0-26.10 Surgeon 
(D.0.T.) (job title) 
Kuder PR-V, Form C, (P.R.): 
0-68, 1-45, 2-25, 3-75, 4-27, 


February 1953. Table 2. 


5-61, 


Reference: Kuder, F. G., Examiner Manual for the Kuder Preference Record 
—Vocational, Form C. Chicago, Ill. : Science Research Associates. 


Notes: (description of the group, evaluation of the data, etc.) 


52 306-294 
N) (Code) 
6-66, 7-62, S48, 9-25, V- 


Fic. 1. Example of card showing codes. 


Step 4. Select all scores 26 P.R. to 35 
P.R. inclusive and list the scale numbers in 
ascending order of magnitude of the per- 
centile ranks next after the apostrophe. Ex- 
ample: 306—294. 

Step 5. Place the V-Score in parentheses 
after the entries so far. This applies pri- 
marily to the coding of profiles of individuals. 
Example: 306—294 ( ). 

a Tt is proposed that any serious user of the 

ne Kuder PR-V prepare codes for all occu- 
Pational groups for which profile data are 
available, such as those reported in the 
Manual (12, Tables 2 and 3) and in various 
journal articles. Then a duplicate set of 
Cards should be prepared for each occupa- 
tional group. (See example.) One set of cards 
Should be filed numerically according to code 
number and the other set alphabetically ac- 
cording to job title. Data for men and 
women should either be filed separately or 
n different colored cards. 

The Dictionary of Occupational Titles 
t ‘O.T.) code number is for cross reference 
° that system of classifying occupations. 

Once these two files are prepared, any 
even profile can be coded and referred to the 
i file for a list of job titles which have 
similar codes. Also, the job title file can be 
oArcheq to determine if the codes of the 
ag uPations being considered by the client 
i ae reasonably well with the code of the 

‘ent’s profile. As new data become avail- 
is appropriate cards can be prepared and 

erted in the card files. 


se of such a coding system facilitates 


more valid use of the Kuder PR-V by bring- 
ing real data to bear on the interpretation of 
a profile rather than basing interpretation on 
“Jogical” guesses which have proved fallible 
in the past. It may be desirable to extend 
the code file to include individual cases so 
that the counselor may then refer to his own 
case records of individuals as well as occupa- 
tional groups for aid in interpretation. 

From the various sources of research, four 
tables of data have been compiled. Table 
1 (M-code) lists the various male occupa- 
tional groups according to the numerical 
value of their codes. The number of sub- 
jects in each group and the reference to the 
original data are also given. Table 2 (M- 
alphabet) lists the various male occupational 
groups alphabetically by job title. Table 3 
(F-code) lists the various female occupa- 
tional groups according to the numerical code 
value. Table 4 (F-alphabet) lists the vari- 
ous female occupational groups alphabeti- 
cally by job titles.* 

It must be remembered that knowledge 
that an individual has interest similar to a 
particular occupational group does not insure 
that he will be successful or even satisfied in 
that occupation. The power of Kuder PR-V 
to predict job success or satisfaction is largely 


1 Tables 1 through 4 have been deposited with the 
American Documentation Institute. Order Document 
No. 4322 from the ADI Ausiliary Publications Proj- 
ect, Photoduplication Service, Library of Congress, 
Washington 25, D. C., remitting in advance $1.75 
for 35 mm. microfilm or $2.50 for 6 X 8 in. photo- 
copies. Make checks payable to Chief, Photodupli- 
cation Service, Library of Congress. 


362 


unknown. However, use of a system such as 
proposed here will bring us one step closer to 
prediction of job satisfaction and possibly to 
a lesser degree, job success. 

There is a limitation in the use of codes of 
interest profiles when based on mean scores 
which should be borne in mind. If the inter- 
est of an occupational group is highly homo- 
geneous, a single code will reflect this interest 
pattern quite accurately. However, if the in- 
terest of an occupational group is quite 
heterogeneous, a single code will not reflect 
the interest of that group accurately. Several 
codes, each based on a homogeneous sub- 
group, may be required to reflect accurately 
the interest of an occupational group. 

An example of an occupational group which 
has heterogeneous interests is “secondary 
school teachers.” The code for “all male 
secondary school teachers” and that of several 


of the sub-groups can be contrasted as fol- 
lows: 


all secondary school teachers 
(male) 


8— 14 
commercial teachers (male) 9°23 
mathematics teachers (male) 23—49 
social studies teachers 

(male) 8’6—1’5 
music teachers (male) 7'6—123’ 
vocational training teachers 

(male) 15—47 


An example of an occu 
appears to have 
“nurses.” The cod 
and several sub 


pational group which 
homogeneous interests js 
es for “all trained nurses” 
“groups are as follows: 


all trained nurses (female) 


83—94 
nurse educators (female) 83—94 
general staff nurses (female) ’83—’94 
private duty nurses (female) 83—94 


public health nurses (female) 8—o 
supervisors and head nurses 


(female) ’8—94 


Researchers are urged to investigate and 
report on the “homogeneity of interest” when 
reporting on the interest of any occupational 
group. This may be accomplished by re- 
porting the frequency of various codes which 
members of the group achieve. However, the 


R. Callis, W. C. Engram, and J. F. McGowan 


establishment of a code frequency distribu- 
tion for an occupational group is often a dif- 
ficult task. It can be done in several ways, 
the first of which might well be coding the 
mean scores. A second way might be a 
tabulation of how many persons in a group 
had scores on the various scales coded in 
different parts of the code; i.e., high (75 
P.R. and above), near high (65-74 P.R.), low 
(25 P.R. and below), near low (26-35 P.R.), 
and not coded (36-64 P.R.). Table 5 is 
such a tabulation for 62 students in a nurs- 
ing education program. It can be seen from 
Table 5 that 54 of the 62 student nurses 
scored 75 P.R. or above on scale 8 (social 
Service) and none of them scored as low a 
the 35 P.R. Ninety-five per cent of this 
group scored 65 P.R. or above on scale 8. 
Thus we see that a reasonably high score On 
scale 8 is typical for almost all student nurses 
in this group. By similar analysis we ca" 
find other characteristics of our group, SUC 
as a low score on scale 9 (clerical). 

The actual frequency with which any pat 
ticular code occurs in a group is probably the 
most precise way in which to describe the 
interest of the group. However, this metho 
does not lend itself well to the making © 
Summary statements about a group. Of tha 
62 student nurses mentioned above, 18 ° 
them achieved codes which contained an 83— 
94 code. That is, they may have had othe" 
scales coded high or low but scales 8 and 
were coded high and scales 9 and 4 were 
coded low, Forty-one of the 62 student 
nurses had scales 8 and 3 coded high withou 
regard for how other scales were coded. — 
required ten different codes or code varia 
Hons to account for all cases in this grouP: 
However, eight of these ten codes were 
merely variations of the 83—94 code. 


Summary 
A proposal has 


363 


Position in the Code 


Not 
Coded 


Near 
Low 


Near 


High High Low 


15 
26 

3 
28 
18 
19 
12 

0 
41 


20 
15 
12 
15 
19 
16 
20 
3 
10 


Mechanical 15 
Computational 12 
Scientific 40 
Persuasive 8 
Artistic 6 
Literary 10 
Musical 17 


MIODAKBDAWHRAHK 


Coding the Kuder Preference Record—Vocational 
Table 5 
Frequency of Scores Appearing in the Various Parts of the Code for 62 Students 
in a Nursing Education Program 
Kuder 
PR-V 
Scale 
| 0 Outdoor 32 10 9 


Social Service 54 
Clerical 2 


COMIDAKN PWN 
Kant PROU RR AN 


Hahn, M. E. and Williams, C. T. The meas- 
ured interests of Marine Corps women re- 
servists. J. appl. Psychol., 1945, 29, 198-211. 

Hathaway, S. R. A coding system for MMPI 
profiles. J. consult. Psychol., 1947, 11, 334- 


337. 


Ways of using codes in interpreting profiles 9% 


heterogeneity of interest within a group upon jg. 
the use of a single code to describe the inter- 


est of that group was discussed as a limitation 
and a caution in the use of codes. 


Received October Z, 1953; 


have been presented. Finally, the effect of 


References 


. Baas, M. L. A study of patterns among profes- 


sional psychologists. Unpublished M.A. thesis, 


H. 


Holland, J. L., Krause, A. H. Nixon, M. E,, 
and Trembath, M. F. The classification of oc- 
cupations by means of Kuder interest profiles. 
J. appl. Psychol., 1953, 37, 263-269. 


. Kuder, F. C. Examiner Manual for the Kuder 


Preference Record—V ocational. Form C., Sec- 
ond Revision. Chicago: Science Research As- 
sociates, 1951. 


Purdue University, 1949. 13. Lewis, J. A. Kuder Preference Record and 

2, Baas, M. L Kuder interest patterns of psy- MMPI scores for two occupational groups. 
chologists. J. appl. Psychol., 1950, 34, 115- J. consult, Psychol., 1947, 11, 194-202. 

p 14. Mathewson, R. H. and Herbert, R. Kuder Prel- 


F Beamer, G. C., Edmonson, L. 


117, 

D., and Strother, 
G. B. Improving the selection of linotype 
trainees. J. appl. Psychol., 1948, 32, 130-134. 


erence Record Profiles for 48 occupational 
fields in six major groups. Cambridge: The 
Guidance Center, 1949. 


. Rundquist, R. M. (personal communication). 


$ Capwell, Dora J. Psychological tests for retail December, 1951. 
store personnel. Pittsburgh: Research Bureau 46, Shaffer, R. H. The measured interests of busi- 
for Retail Training, University of Pittsburgh, ness school seniors. Occupations, 1949, 27, 
se aa AN 462-465. 
5. Diamond, S. The interpretation of interest pro- 17, Speer, G. S. The Kuder interest test patterns 
J. appl. Psychol., 


- DiMichael, S. G. The 


d Eimicke, V. W. Kuder Prefer 


files. J. appl. Psychol., 1948, 32, 512-520. 
professed and measured 


interests of vocational rehabilitation counselors. 


Educ. psychol. Measmt, 1949, 9, 59-72. 
ence Record norms 


18. 


of fire protection engineers. 

1948, 32, 521-526. 
Triggs, Frances @. 

the counseling of nurses. 


1946, 46, 312-316. 
Frances O. The measured interests of 


Kuder Preference Record in 
Amer. J. Nursing, 


fo i ations, 1949, 28, 5- 19. Triggs 
10. aie Sale Coe ; hae. J. educ. Res., 1947, 41, 25-34. 
8 Wiener’s coding of 20. Wiener, D. W. Empirical occupational group- 


% Frandsen, A. N. A note on 


Kuder Preference Record profiles. Educ. psy- 


chol. Measmt, 1952, 12, 137-139. 


ings of Kuder Preference Record profiles. 
Educ. psychol. Measmt, 1951, 11, 273-279. 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 5, 1954 


Transfer of Training in Tracking as a Function of Control 
Friction * 


F. A. Muckler and W. G. Matheny 


University of Illinois 


The degree to which a training device 
should simulate the final task is of consider- 
able practical interest to those concerned with 
training as well as to the manufacturers of 
training devices. A training device may simu- 
late a psychomotor task to a greater or lesser 
degree along several dimensions. One of 
these dimensions is the control force neces- 
sary for accomplishing the task. The ques- 
tion becomes: what degree of fidelity of simu- 
lation of control force is necessary in the 
training device in order to secure optimum 
transfer of training? 

The present study was designed to investi- 
gate the effect of varying control friction 
upon transfer of training in a visually guided 
tracking task. 

Despite a considerable literature, the ex- 
perimental evidence on the effect of friction 
in control mechanisms is not clear cut. In 
general, friction has been found to be un- 
desirable, but the effect is a function of such 
variables as: (a) the type of friction involved 
(4, 7, 9, 13); (b) the tracking task (6, 8); 
(c) the presence or absence of inertia (7, 11); 
(d) the radii when handwheels or knobs are 
used (13, 14); and (e) the Tesponse measure 
recorded (6, 10, 15}. Further, the effect of 
friction may be specific to complex interac- 
tions of many of these variables (7, 14). 

All of these studies are concerned with 
either original learning or performance situa- 
tions while the question of transfer from one 
control friction to a different control friction 
remains relatively uninvestigated. In a study 
summarized by Craik (2) and reported by 
Vince (16), subjects were trained to make 


1This research was supported in part by the 
United States Air Force under Contract AF 33(038)~ 
25726, monitored by the Air Force Personnel and 
Training Research Center, Permission is granted for 
reproduction, translation, publication, use and dis- 
posal in whole or in part by or for the United 
States Government. We should like to thank Dr. 
L. H. Lanier, Dr. A. C, Williams, and Dr, W. E. 
Kappauf for their valuable suggestions and criticisms. 


corrections with a lever operated against @ 
stiff spring. After the subjects were making 
accurate movements, the spring tensions w 
changed. The new response was found to Dé 
delayed by at least 0.16 second; this time 10- 
terval was termed the “kinesthetic reaction 
time.” More directly applicable is the ex 
periment reported by Bilodeau (1). Two 
groups rotated a crank handle at either heavy 
or light loads for five minutes. A third gronn 
practiced first under a light load and eee 
under a heavy load alternately for one minute 
periods for five minutes. The fourth gron 
started under a heavy load changing to 
light load under the same procedure. i 
interest here is the fact that when the pia 
two groups were shifted, “rate output was ap 
proximately equal to that of non-shifting 
groups” (1, p. 100). These data are a 
preted here to imply that there is no speci : 
effect of previous practice on either a ee! 
or light load to the performance of the tas 
under the light or heavy load, respectively: 

In this experiment, the effect of changing 
friction upon the level of performance iN rf 
visually guided tracking task was invest 
gated. Experimental evidence was sought i 
a change from a higher friction to a lape 
friction, from a lower friction to a highe 
friction, and from a “frictionless” conditio” 
to a friction system. 


Experimental Method 
Experimental 
Pursuit tracking 
track a contin; 
moving roll of 
a horizontal slit 


d, back 

type pointer. r a 

Ing responses were recorded in the form the 
Sponse line on the paper with 


: F s- 
a nin the system could be varied ad 
tematically by means of a brake drum abbot de 
to the control lever, The friction was indepe 


364 


Transfer of Training in Tracking 


ent of both rate and extent of control movement. 
Further, there was no “centering” tendency of 
the control lever. 
i Procedure. Each subject was given the fol- 
owing instructions: 
ee instrument is called a tracking device. 
in ‘his opening (point) there will appear a mov- 
§ line, which will go back and forth across the 
Opening. This (point) is the control handle. 
(sh yon move the handle forward, this pointer 
hno will move to the right; as you move the 
left e backward, the pointer will move to the 
re Now the pointer that you control will 
will i bs mark on the moving paper. Your job 
Tine A to match the mark you make with the 
ae = is presented to you. Please use only 
ae hand, the hand you start with. Are there 
Yy questions? 
Fhe reduce fatigue effects, the subjects were 
oe a two-minute rest period after the, comple- 
tenni of twenty cycles. After the subject had 
3 hed criterion on the original learning task, 
ees sent from the room while the control 
n was changed. The time from comple- 
hias at original learning to the beginning of the 
o Sler trials was, in all cases, two minutes. 
the Observe the effect of the two-minute break, 
Control group was given a two-minute rest 


a i : 2 
nd then continued the task with the same fric- 


tion load, 

lear ntgrion. The subjects were said to have 

moe the pattern when they did not deviate 

ine f than two millimeters from the stimulus 

fined or three successive cycles. A trial was de- 
E as one sine wave cycle. 

were erimental design. Seven experimental groups 
6 assigned: 0 (approximately 2.5 ounces), 2, 

Was 7 8, 10, and 12 pounds. The basic design 

(17) "te familiar paradigm cited by Woodworth 

as Plan 4: 


Transfer group learns A . Learns B 
Control group ..... <e» Learns B 


a Control group selected was the six-pound 
lon group. Thus, three groups—8, 10, and 
Dress, Ounds—transferred to the lower control 
Bounds” of six pounds. The groups 0, 2, and 4 
Sure S, transferred to the higher control pres- 
sam of six pounds. The basic design is the 

M in all cases, , 
cent {surement of transfer is recorded in per 
Gag, 2Vings of trials. The formula used is from 
D Sug, Foster, and Crowley (5): 

e 
Concent transfer = 

rol group score—transfer group Score x 100. 

ol group score—total possible score 


Contr 


inc 

be E response measure used was the num- 

coug „trials to criterion, the total possible score 
e reduced to zero. 


365 


Subjects. A total of 105 Air Reserve Officer 
Training Corps Cadets were used. The age 
range was 17 to 25 years with a mean of 19.6 
years. The subjects were assigned at random, 
on the basis of a table of random numbers (3), 
so that each experimental group contained 15 
subjects. One restriction was placed on the 
randomization, namely, the subjects were as- 
signed in blocks of seven. 


Results 


Original Learning. The mean number of 
trials to reach criterion is shown in Table 1 
for each experimental group. Since Bartlett’s 
test for homogeneity (3) showed the vari- 
ances of these scores to be homogeneous, and 
since the distribution of trials was found to 
be “moderately? normal, an analysis of 
variance was computed. There were no sta- 
tistically significant differences between the 
experimental groups in original learning. 

However, since the distributions did show 
some skewness, confirmation of the analysis 
of variance result was sought by the use of 
a distribution-free technique described by 
Mood (12) as “simple linear regression.” 
The application of this test gave results com- 
pletely in accord with those obtained from 
the analysis of variance. 

Transfer of Training. The mean number 
of transfer trials necessary to reach criterion 
for every experimental group is shown in 
Table 1. Per cent transfer of training was 
computed on the basis of the formula men- 
tioned previously. In Figure 1, per cent 
transfer of training is shown as a function of 
control friction. Individual transfer points 
are: 0 pounds, 86 per cent; 2 pounds, 91 per 
cent; 4 pounds, 90 per cent; 6 pounds (con- 


Table 1 

Mean Number of Trials to Reach Criterion 
Experimental Original Transfer 
Groups Learning Learning 

0 28.6 3.8 

2 25.5 25 

4 27.0 2.7 

6 27.3 0.0 

8 25.6 1.4 

10 22.0 2:5 

12 24.9 2.8 


366 


100. 


90. 


80. 


as 


CONTROL FRICTION - LBS 


PER CENT TRANSFER 


Fic. 1. Per cent transfer as a function of control 
friction. 


trol), 100 per cent; 8 pounds, 93 per cent; 
10 pounds, 90.8 per cent; and 12 pounds, 
89.7 per cent. 

It will be recalled that the control group 
(6 pounds friction) was given a two-minute 
break after original learning and then con- 
tinued on the same task as may be seen in 
Table 1. There was no decrement of per- 
formance observed; the criterion level was 
maintained. This result may be interpreted 
as 100 per cent positive transfer and will be 
found, as such, in Figure 1. 

Ignoring the 6 point control group, a test 
was made of the significance of differences be- 
tween the experimental groups on raw score 
transfer scores. Since the distribution of 
transfer trials was highly skewed, the re- 
sults were evaluated by the distribution-free 
technique Previously described as “simple 
linear regression.” The chi-square evaluation 
showed that the null hypothesis is accepted 
and that there were no Statistically significant 
differences between the transfer groups. 


Discussion 


Original Learning. The results indicate 
plainly that performance to criterion under 
these experimental conditions was independ- 
ent of control friction with the response meas- 
ure used. Of the literature previously cited, 
both Hick and Clarke (9) and Gray and Ell- 
son (6) have obtained similar results, 


F. A. Muckler and W. G. Matheny 


Transjer oj Training. The results mdi 
that a change in friction had very little e at 
on the level of performance. The low a 
mean transfer for an experimental ia a 
86 per cent for the 0 pound group. ween 
there were no significant differences be a 
experimental transfer groups, these data re 
that transfer of training in this tracking aan 
was relatively independent of control re 

The implication of these data for up a 
devices seems clear. Where control P 
are a variable, optimum transfer will e 
tained by exact simulation; nevertheless. a 
will be lost if the control force varied. re- 
ously, since this conclusion rests on e 
sults of a relatively simple laboratory a 
further validation with specific training 
vices seems necessary. 


Summary 


The effect of transfer from several eee 
of friction to another level of friction we 
manual control system was apie.” 
Transfer effect was found to range ne un 
93 per cent positive transfer; it was 3 4 
that transfer was relatively independen in 
control friction under the conditions My Jite 
this study. Finally, control friction ha ait 
apparent influence on original learning 
the criterion measure used. 


Received October 22, 1953. 


References irom 

- Bilodeau, E, A, Decrements and recovery ariar 

decrements in a simple work task yita stages 

tion in force requirements at differen 

of practice, J, exp. Psychol, 1952, % 

100, erato! 
» Craik, K. J. W, Theory of the human OP i en 

in control systems, J. The operator vad j 38; 

gineering system. Brit. J, Psychol. 1 

56-61, 


ke chor 
3. Edwards, A. L, Experimental design i ig 950 
logical research, New York: Rinehe wip” 
4. Fitts, P, M. Engineering psychology 4” gank 
ment design. In S. S, Stevens (Ed), york’ 
book of experimental psychology. Ne P 
Wiley, 1951, pp. 1287-1340. MB 
5. Gagne, R. M., Foster, H.. and Crowley, Ming 
The measurement of transfer of 
Psychol. Bull., 1948, 45, 97-130. 
6. Gray, Florence E. and Ellson, D. G. 
friction and mode of operation Pe 
of tracking with the GE pedestal sigh 


of 
g fjects 
Bf uro? 


ABE 


Air Mat. Comm., Aero Med. Lab. Report 
TSEAA-694-2c, 1947. 
7. Helson, H. Design of equipment and optimal 
human operation. Amer. J. Psychol., 1949, 
42, 473-497. 

8. Hick, W. E. Friction in manual controls with 
special reference to its effects on accuracy of 
corrective movements in conditions simulat- 
ing jolting. Mot. Skills Res. Exch., 1949, 1, 
9. (Abstract.) 

9. Hick, W. E. and Clarke, P. The effects of heavy 
| loads on handwheel tracking. Mot. Skills 
Res. Exch., 1949, 1, 20. (Abstract.) 

10. Jenkins, W. L., Mass, L. O., and Rigler, D. In- 
fluence of friction in making settings on a 
linear scale. J. appl. Psychol., 1950, 34, 435- 
439, 
11. Jenkins, W. L., Mass, L. O., and Olson, M. W. 
| Influence of inertia in making settings on a 


Transfer of Training in Tracking 


12. 


13. 


14. 


. Vince, Margaret. 


367 


linear scale. J. appl. Psychol., 1951, 35, 208- 


213. 

Mood, A. M. Introduction to the theory of sta- 
tistics. New York: McGraw-Hill, 1950, 406- 
407. 


Raines, A. and Rosenbloom, J. H. Ideal torques 
for handwheels and knobs. Machine Design, 
1946, 18(8), 145-148. 

Reed, J. D. Factors influencing rotary pursuit. 
J. Psychol., 1949, 28, 65-92. 


. Searle, L. V. and Taylor, F. V. Studies in track- 


I. Rate and time character- 
J. exp. 


ing behavior. 
istics in simple corrective movements. 
Psychol., 1948, 38, 615-631. 
Corrective movements in a 
pursuit task. Quart. J. exp. Psychol, 1948, 
1, 85-103. 


. Woodworth, R. S. Experimental psychology. 


New York: Henry Holt, 1938. 


Tue JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 5, 1954 


A Correction of the Clark-Owens Validation Study of the 
Worthington Personal History Technique 


Robert F. Peck 


Worthington Associates, Chicago, Illinois 


and 


William Stephenson 
University of Chicago 


In a recent paper (1) on the Personal His- 
tory method, the following conclusions were 
drawn: 


1. Five isolated personality trait scores, 
from two standard inventories, were a more 
“efficacious” assessment device than individ- 
ual reports derived by the Worthington Per- 
sonal History method. “Efficacy” was not 
defined by the authors, but presumably they 
meant accuracy in predicting the job effec- 
tiveness and promotability of the industrial 
employees in the study. 

2. This (they said) “constitutes damaging 
evidence as to the usefulness of the Personal 
History.” 

3. Furthermore (they continued) “these 
results . . . tend to follow the pattern of 
dubious or negative results found in valida- 
tional studies of other projective techniques.” 

As a matter of fact, even the selected data 
included in the Clark-Owens report would 
lead an impartial investigator to exactly the 
opposite conclusion on each of these points. 

Part of the explanation appears to lie in 
the fact that several major errors were com- 
mitted in designing and executing this little 
study. Constructive corrections for 
were recommended to Clark on January 6, 
1953, following a conference with him in De- 
cember, 1952; but the issues still appear to 
be disregarded in the recent Clark-Owens 
article. 


these 


1. The criterion was a set of r 
co-workers (not supervisors), according to 
this pattern; Judges 1 and 2, in Dept. X, 
rated subject A; judges 3 and 4, in Dept. Y, 
rated subject B; and so on. No attempt was 
made to find the comparability of ratings 
made by different judges in different depart- 


atings by 


ments. Thus, the reliability of the aien 
is an unknown quantity, with an error of ' 
known but undoubtedly considerable size- ig- 
it were not that these ratings proved 4 
nificantly related to both the PH peng 
the standard inventories, implying some this 
of meaningful stability in the criterion, 
feature would invalidate the entire study- ve 

2. The research was ultimately age 
to a few traits, apparently because only t jes. 
traits could be measured by the agence 
The proper procedure, of course, to eel 
the PH reports, would be to measure volicy 
traits which the PH covers. (Editorial po 
does not allow space for an illustrative ie.) 
report. See reference 10, for an examp o 
The task of measuring the interaction e- 
traits, which the PH undertakes to do, is eS; 
yond the scope of standard-inventory io 
of course, especially if the scores are lige 
singly. Perhaps for this reason, this 14 as 
issue was ignored. In short, the study ve 
not really adequately designed to test 
validity of the PH. r 

3. A peculiar and persistent error in U 
the Chi-square method is explained below 
Conclusion No. 2, ens 

4. Despite the fact that Clark and Oo 
(erroneously) termed the contingency ot 
efficients for both PH and inventories the 
significant,” they proceeded to compare nly 
Coefficients for the two methods, though ef 
on five personality traits. In doing this: " cy 
cpparently did not realize that conting 
Coefficients from different sets of data wwe 
be compared unless a class-index correc co 
is applied (2). These are not correlation ©” 
efficients, Without the correction, it 15 on! 
Possible to tell whether a C of .75 from 


sing 
in 


368 


Correction of Clark-Owens Validation Study 


set of data is larger, equal, or smaller than a 
C of .65, .75 or .85 from different data. This 
Is a relatively minor point, but it is still an 
error. 


The Correct Conclusions from the Data 


Despite the questionable or fallacious pro- 
cedures, the actual data which Clark and 
Owens report clearly show the following facts: 

1. The Personal History reports were trans- 
lated into personality-trait ratings and into 
job performance ratings, by five psycholo- 
gists, with a high degree of reliability (Ad- 
Justment to Others .91, Job Effectiveness .79, 
Promotability .93, for example). 

2. The Personal History ratings thus ob- 
tained showed a high, significant relationship 
to the criterion, both on the personality traits 
and on Job Effectiveness, Adjustment to Co- 
Workers, and Promotability. 


Contingency Coefficients (C) PH vs. Ratings 


Active .605 
Impulsive 655 
Dominant 654 
Stable 676 
Sociable 585 
Job-effectiveness 513 
Promotion Possibilities 697 


Adjustment to Others 614 


Through a misuse of chi-square methods 
Pointed out to them in the letter of Janu- 
wy 6, 1953), the authors report that these 
Ontingency coefficients, ranging from .51 to 
0, are “not statistically significant.” Mr. 
lark reported, in December 1952, that this 
*Ppened because the 47 subjects were sub- 
'Vided into many cells, several of which con- 
‘ined Jess than 5 cases. Since an extremely 
arge correction factor has to be applied—a 
i cedure which is not acceptable, even tech- 
a? to most statisticians—almost #0 C0- 
lent would appear significant, no matter 
ae high. This is a technically possible, but 
or meaningless, procedure. However, 
“It own findings indicate that if proper chi- 
es divisions were applied to these at 
est, the Personal History and the amo 
toni show a significant degree of re x 
lea. UP with the criterion ratings. This, a 

St, is our considered opinion, and that of 


Si 


369 


several other statistically competent psycholo- 
gists (3, 4). 

3. The standard inventories showed a sig- 
nificant relationship to the criterion on five 
isolated personality traits, of about the same 
order as the PH-criterion relationship on 
these five traits (Active, Impulsive, Domi- 
nant, Stable, Sociable). However, these in- 
ventory trait scores show no power to predict 
Job Effectiveness, Adjustment to Co-workers, 
or Promotability. Indeed, it appears from 
the Clark-Owens report that no effort was 
made to attempt such a prediction from the 
inventories, although the criterion was avail- 
able. 

4. Thus, on the crucial criteria for de- 
termining the efficacy, as well as the validity, 
of any assessment method (5)—Adjustment 
to Co-workers, Job Effectiveness, and Pro- 
motability—the Clark-Owens data show that 
the Personal History method was significantly 
effective. Since the authors report no at- 
tempt to measure the predictive power of the 
standard inventories against these criteria, 
the “efficacy” of those inventories for predict- 
ing job performance remains wholly untested 
and unproven. Indeed, since the PH meas- 
ured the individual traits about as well as the 
inventories, and additionally measured job 
performance, it would seem that the inven- 
tories are not needed, in this setting. This is 
contradictory, of course, to the statements 
Clark and Owens made about “efficacy.” 

5. Clark and Owens’ remark about “dubi- 
ous or negative” findings on other validation 
studies of projective techniques requires refer- 
ence to numerous studies which have demon- 
strated positive validity for these methods. 
Naiveté, or errors of logic, in research design 
and in the use of statistical methods, have 
frequently resulted in “dubious” findings. 
However, properly designed research has re- 
peatedly shown that projective techniques, 
among them the Personal History method, 
can be valid predictors of overt, daily be- 
havior in the work world, as well as in the 
clinic (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16). 


Received June 4, 1954. 
Published out-of-turn by the editor. 


370 


References 


1. Clark, J. G. and Owens, W. A. A validation 
study of the Worthington Personal History 
blank. J. appl. Psychol., 1954, 38, 85-88. 

2. Kelley, T. L. Statistical method. Macmillan 
Co., N. Y., 1924, p. 266. 

3. Cureton, E. E. Validity, reliability and baloney. 
Educ. psychol. Measmt., Spring, 1950, 10, 94— 
96. 

4. Lewis, D. and Burke, C. Use and misuse of chi 
square technique. Psychol. Bull, 1949, 46, 
433-489. 

. Thorndike, R. L. Personnel 
Wiley & Sons, N. Y., 1949, 

6. Beck, S. J. The six schizophrenias: reaction 
patterns in children and adults. Res. Monog. 

No. 6, Amer. Orthopsychiatric Assoc., Inc., 
N. Y., 1954. 

7. Endacott, J. I. Methodology for the study of 
clinical cases by the way of Rorschach and 
psychoanalytic theories. Unpublished Ph.D. 
dissertation, The University of Chicago, 1954. 

8. Henry, W. E. The Thematic Apperception tech- 
nique in the study of culture-personality rela- 
tions. Genet. Psychol. Monog., 1947, 35, 1- 
135. 

9. Nevis, E. C. The efectiveness of the Worthing- 
ton Personal History techni 


u 


selection. John 


que in assessing 


10. 


11. 


12, 


Robert F. Peck and William Stephenson 


Air Force officers for command and staff lead- 
ership. Unpublished Ph.D. dissertation, West- 
ern Reserve University, 1954. 

Peck, R. F. and Thompson, J. M. The use of 
individual assessments in a management de- 
velopment program. J. personnel admin. M- 
dustr. Relat., April, 1954, 1, 79-98. 

Peck, R. F. and Worthington, R. E. New tech- 
nique for personnel assessment. J. personnel 
admin, industr. Relat., January, 1954, 1, 23- 
30. 

Spencer, G. J. and Worthington, R. E. Validity 
of a projective technique in predicting sales 
effectiveness. Personnel Psychol, 1952, % 
125-144. 


- Stephenson, W. Q-methodology and the projec- 


tive techniques. J, clin, Psychol, 1952, ® 
219-229. 


- Swint, E. R. The Worthington Personal His- 


tory: a report. J. ind, Train, Nov-Dec» 
1950, 


- Swint, E. R. and Newton, R. A. The Personal 


History—a second report. J. ind. Traits 


Jan-Feb., 1952, 


- Worthington, R. E. Use of the Personal His- 


tory form as a clinical instrument. Unpubs 
lished Ph.D. dissertation, The University 
Chicago, June, 1951, 


_—————— E CC OO rll OOOO 


Tue Journat or Ap. Ps 
NA APPLIE! Yc GY 
Vol. 38, Nos. 1954 D PsycHOLoGy 


A Reply to Drs. Peck-Stephenson 


William A. Owens, Jr. 
Iowa State College 


Drs. Peck and Stephenson, as might have 
been anticipated from their obvious interest, 
have seen fit to make some interesting and 
Ingenious comments upon the Clark-Owens 
Study (1) of the Worthington Personal His- 
tory Blank (PH). However, since their com- 
ments purport to be “a correction” they 
Should be examined in order. 

1. Peck and Stephenson say the criterion 
employed, that of associates’ ratings, is un- 
reliable, although they state that it “proved 
Significantly related to both the PH ratings 
and the standard inventories.” Actually, 
these relationships were 7ot statistically sig- 
nificant, although this “unreliable criterion” 
was found to be more closely related to stand- 
ard inventory results than to PH results (on 
the only five traits presumably measured by 
both) five times out of five. In this regard, 
at least, it was quite consistent. 

2. Peck and Stephenson seem to feel that 

€ five traits common to PH and the avail- 
able standard inventories were not enough to 
Constitute any real evidence as to the validity 
PH. It was, of course, only in the case of 

ese five traits that PH could really be 
evaluated, since low criterion relationships 
Could well be attributed to low criterion re- 
ability or validity unless they were differ- 
ee low. They also state that the PH 
* ‘sures the interaction of traits (we pre- 
ig me clinically, since no quantitative evidence 
theroted), and that it, therefore, goes beyond 
e scope of standard inventories in a global 
ection, However, the obtained Clark- 

Wens estimate of the relationship between 
the results and criterion ratings is lowest ait 
Shae OF “Job effectiveness”—a complex 

aracteristic—and about average for “pro- 
She possibilities” and “adjustment to 
— It would thus seem that the PH 

S not yield better global than simple esti- 

aa in this sample. d 
Gwe Peck and Stephenson accuse Clark an 

ns of a “peculiar and persistent error M 


using the Chi-square method,” in spite of 
their earlier advice to us. Let us examine 
their arguments. (a) They say that we di- 
vided our 47 cases among too many cells, 
“several of which contained less than 5 cases.” 
However, the theoretical consideration relates 
to expected frequencies, not to observed, and 
even so, the number 5 is relatively arbitrary 
(5). (b) They imply that some enormous 
correction for continuity should have been 
made and was ignored, whereas Cochran 
(2) states that “Tables with more than 1 
degree of freedom and some expectations 
greater than 5—should—use x? without cor- 
rection for continuity.” (c) Finally, they con- 
clude that, in their considered opinion, both 
the PH and standard inventory results would 
be significantly related to the criterion in, 
say, a 2 X 2 table. How this could happen is 
a bit hard to understand, since Guilford (3) 
states, “There is probably nothing to be 
gained by applying Yates’s correction when 
there is more than 1 degree of freedom.” 
And again, still quoting Guilford, “The effect 
of the correction is to reduce the size of x°.” 
Thus, had Clark-Owens followed the pro- 
cedure suggested by our critics, the effect 
would have been to remove the obtained x? 
values still further from significance. 

4. Peck and Stephenson say, quite cor- 
rectly, that contingency coefficients cannot be 
compared without making a class-index cor- 
rection. They also say, even more correctly, 
that “this is a relatively minor point.” Ac- 
tually, making this correction would do prac- 
tically nothing to the relative magnitudes of 
PH vs. test validities. ‘The test coefficients 
end to receive larger corrections, since 
larger; and the PH would 
tend to receive larger corrections because the 
number of cells is somewhat smaller. If Peck 
and Stephenson had bothered to compute it, 
they could have observed that the differential 
shifts could not have exceeded .01 or .02. 
This, of course, would not remotely approach 


would t 
they are initially 


371 


372 William A. 
changing the direction of a single difference 
—and direction is all that is involved in the 
randomization test. 

5. In their second section, purportedly 
dealing with “Correct Conclusions from the 
Data” Peck and Stephenson become very 
seriously, if not willfully, confused. They ap- 
pear to mistake an omission for a negative 
result saying, “these inventory trait scores 
show no power to predict Job-Effectiveness, 
Adjustment to Co-workers, or Promotability.” 
The reason they do not is that, in an attempt 
to be fair to the PH method, Clark-Owens 
did not report them. Actually, three of our 
judges subsequently did considerably better 
in predicting these three characteristics from 
the objective test results than from the PH. 
However, they were more familiar with the 
former, and the data may have been slightly 
contaminated. In any case, it was Clark- 
Owens’ stated purpose to evaluate PH ys. 
objective tests—not PH vs. objective tests 
plus an imponderable interpreter of them. 
Peck and Stephenson surely realize that the 


tests do not yield scores on these three char- 
acteristics. 


Owens, ‘Jr. 


6. Finally, Clark-Owens’ critics take them 
to task for a comment about “the pattern of 
dubious or negative results found in valida- 
tional studies of other projective techniques. 
An answer to them requires only a reference 
to Schofield (4), who summarizes all validity 
studies reported in 1949, 1950, and 1951, 
and indicates that two-thirds to three-fourths 
of them yielded negative results. n 

All-in-all, Clark-Owens must firmly reject 
the alleged corrections of Peck-Stephenson, 
although fully granting the limitations ° 
their study as originally set forth. 


Received July 20, 1954. 
Published out-of-turn by the editor. 


References 


1. Clark, J. G. and Owens, W. A. A validation 
study of the Worthington Personal History 
Blank. J. appl. Psychol, 1954, 38, 85-88- ft 

- Cochran, W. G. The x? test of goodness of #k 
Ann. Math, Statist., 1952, 23, 315-345. y= 

- Guilford, J. P. Fundamental statistics in Lae 
chology and education. New York: McGra 
Hill, 1950. y 

» Schofield, W. Research in clinical psychology: 
J. clin, Psychol., 1952, 8, 255-261. entero 

- Walker, Helen M, and Lev, J. Statistical infer 
ence. New York: Henry Holt, 1953. 


w 


> 


n 


— 


THE JOURNAL or AP S 
Vol. 38, No. 5 ie PSYCHOLOGY 


Applied Psychology in Action 


GATB in Foreign Countries 


Beatrice J. Dvorak 


Testing Branch, U. S. Employment Service, Washington 25, D. C. 


ee USES General Aptitude Test Battery 
laaa translated into a number of foreign 
fies ages, ana research is being conducted in 
aidi loreign countries to adapt and stand- 
tiles e i use on populations in those coun- 
U. z aa a has been granted by the 
stauait mployment Service to the following 
Sine o S and individuals to use the 
ao in such research. While information 
Bie available regarding the status of all of 
Thee Projects, it is known that the French, 
Kee oe Portuguese, and Spanish editions 
already been published. 


c Argentina 
a 
aes A. Pourteau Agote 
i versislad de Buenos Aires 
ep atone Psicotecnico 
ublica, Argentina 


Australia 
iA. Bland ‘ 
yePartment of Labour and National Service 


Me ) 
Ourne, Australia 


Kp Buyse Belgium 

P of Lourain 
J nai, Belgium 
se Herickx 

Eaa d Orientation 
Mt elles, Belgium 
Peygeals 

es gochnicien de la Société Nationale 
tux hemins de Fer Vicinaux 
elles, Belgium 


Abita; 
Cypitaine Commandant Hourman 
Minista? Centre d'Orientation 
Bruxene’ de la Defense Nationale 

es, Belgium 
inig@enborre 
Bruxelles de l'Instruction Publique 
les, Belgium 


Q 


Jac Brazil 
Div, Magalhaes iii 
Rio d © de Organizacao do Trabolho 


aneiro, Brazil 


73 


Livraria Oscar Nicolai 
Caixa Postal 246 
Brazil 
S. J. Schwarzstein 
Director do Servico de Colocacao e Informacao 
Profissional 
Sao Paulo, Brazil 
Secretaria do Trabalho 
Servico de Colocacao e 
Informacao Profissional 
Sao Paulo, Brazil 
Canada 
Morgan D. Parmenter 
Director, The Guidance Centre 
University of Toronto 
Toronto 5, Canada 


China 
Ministry of Social Affairs 
Shanghai, China 

Denmark 

Poul Bahnsen 
Director, Psykotekniske Institut 
Copenhagen K.—Denmark 
Paul Vidriksen 
Arbejdsdi Rektoratet 
Kopenhavn, Denmark 


England 
M. Desai 
Psychological Department, London County 
Council 


London, England 

H. J. Eysenck and J. Tizard 
The Maudsley Hospital 
London S. E. 5, England 


C. B. Frisby r , 
Director, National Institute of Industrial Psy- 


chology 
London W. C. 2, England 
Roland Harper and D. R. Martin 
The University of Leeds 
Leeds 2, England 
B. W. Richards , 
St. Laurence’s Hospital 
Caterham, Surrey, England 
Constance M. Mathieson | 
East Anglian Regional Hospital Board 
Norwich, Norfolk, England 


374 Applied Psychology in Action 


Alec Rodger SAR 
Birkbeck College, University of London 
London, England 


India 
Vocational Guidance Bureau 
Bombay, India 

Italy 


Ing. Vincenzo Flagiello F 
Societa per l'Industria e l’Elettricita 
Centro Istruzione Professionale 
Viale Benedetto Brin 

Terni, Italy 


Agostino Gemelli 


Director, Laboratorio di Psicologia Sperimentale 


Milano, Italy 


Guido Majaron 
Viale Arnaldo Fusinato 2F 
Vicenza, Italy 


Consiglio Nazionale delle Ricerche 
Instituto Nazionale di Psicologia 
Rome, Italy 


New Zealand 


Auckland University College 
Auckland C., 1, New Zealand 


W. J. H. Clark 
Vocational Guidance Centre 
Auckland, New Zealand 


: Peru 
Santiago Salinas 


Ministerio de Trab 
Lima, Peru 


Philippines 
Antonio V., Roxas 
Escolta, Manila, Philippines 


ajo y Asuntos Indigenas 


Scotland 
P. S. Boyd and W. M. Miller 


Department of Mental Health 
Aberdeen, Scotland 


South Africa 
D. J. Du Plessis 
Department of Labor . 
Johannesburg, Union of South Africa 
C. P. J. Erasmus 


University of the Orange Free State 
Bloemfontein, South Africa 


Department of Psychology 
University of Stellenbosch 
Stellenbosch, South Africa 
Evryl Fisher 

Church Street 

Cape Town, South Africa 


Sweden 
Torsten Husen 


Cintrala Varnpliktsbyran 
Personalprovningsdetaljen 
Stockholm 10, Sweden 


Switzerland 
J. F. Herzog 


Office d'Orientation Professionelle 
Neuchâtel, Switzerland 

Ph. H. Muller 

Université de Neuchâtel 
Neuchâtel, Switzerland 


Turkey 
Faruk Kardam 


Director-General of the Turkish Employment 


Service 
Ankara, Turkey 


N- n ES 


Book Reviews 


Tuckman, J. and Lorge, I. Retirement and 
the industrial worker: prospect and reality. 
New York: Bureau of Publications, Teach- 
ers College, Columbia University, 1953. 
Pp. xvi + 105. $2.75. 

he one reports the results of a survey 

Cloak aken at the request of the New York 

Car Joint Board of the International Ladies’ 

. Workers’ Union. The study in- 

the oT by means of personal interviews, 

ie itudes toward retirement of three dif- 
sisted groups of persons. These groups con- 
their i (1) 204 men and women still on 
submits a (2) 216 men and women who had 
were s applications for retirement but who 
Sone i working, and (3) 240 retired per- 
memb Il interviewees were or had been 
âd ers of the above named union, and all 
gies, their livelihood in the needle 
views The schedules used in the inter- 
relath were designed to obtain information 
eS to a wide variety of employment-re- 
six oa questions which fall generally under 
outline f headings. These headings form the 
ttitud or the book and include: Retirement 
on W a Health, Pressure Effect of Aging 
tion a k Performance, The Worker’s Prepara- 
n Sa Retirement, Effect of Retirement on 
tient n and Factors Related to Retire- 
ttitudes. 

o €sults are, of course, reported in terms 
Percentages of respondents falling in each 

i os response categories. Statistical sig- 

he Is tested by means of chi square 

stay = is to the authors’ credit that they 

5 Se sins to their facts and figures. They 

a commit the error (so common in the 

bye ure about the problems of older em- 

oe of launching into long opinionated 

ee Nor do they attempt to derive 

atr alizations from their data which are not 

ion anted by the narrowness of the popula- 
Studied. 

excen last pages of the book consist of an 

Sectign t summary of the study and a short 
hae to conclusions and recom- 
oe It is this last section that will 

Search not useful to other persons doing Tê- 

ÎS her on older employee utilization. For it 

th, Cte that one finds a wealth of hypotheses 

heed to be tested on a broader basis. 


li 


The primary barriers to the utilization and/ 
or happy retirement of older persons are 
clearly outlined and questions are formulated 
which could well form the framework for 
other research programs designed to find 
methods of overcoming these barriers. 

Presentation of the survey results could 
have been made much clearer and more easily 
understood. As it is, the reader is confronted 
with table after table of percentages which, 
although clearly titled and well organized, 
finally contribute to an overwhelming sense 
of boredom. Simple bar diagrams, pie charts, 
frequency polygons, and histograms could 
have been used to great advantage to facili- 
tate quick and accurate interpretation of the 
results presented. 

This book represents one of the most ex- 
tensive researches into the attitudes and prob- 
lems of working, retiring, and retired work- 
ers yet performed. As such, it is a “must” 
for persons engaged in the study of employ- 
ment and retirement problems of older em- 
ployees. In addition to the wealth of data 
presented, it is a rich source of research hy- 
potheses, and points up the problems which 
must still be solved by researchers in this area. 

Marvin D. Dunnette 


Industrial Relations Center 
University of Minnesota 


Berdie, R. F. (Editor). Roles and relation- 
ships in counseling. Minnesota Studies in 
Student Personnel Work, No. 3. Min- 
neapolis: University of Minnesota Press, 
1953. Pp. 37. $1:25; 

This publication consists of three papers 
presented at the Second Annual Conference 
of Administrators of College and University 
Counseling Programs held at the University 
of Ilinois in 1951. 

In the first paper, Jobn Gustad discusses 
the definition of counseling. Clinical psy- 
chology and counseling can be considered to 
be essentially “one general kind of endeavor 
but with differing emphasis.” Both include 
psychotherapy «where appropriate to the 
client and within the province of the prac- 
titioner.” Teaching and counseling are dif- 
ferentiated largely in terms of different train- 
ing and experience. His definition of coun- 
seling stresses the role of learning and the 


376 


requirement of professional competence. The 
analysis of the problem and review of the 
literature are helpful. , 

In the next paper Ralph Berdie describes 
the tactics and techniques developed in the 
Counseling Bureau at Minnesota to deal with 
problems of human and institutional relation- 
ships (which he terms public relations “in a 
very limited sense”). His base point is ef- 
fective service. Beyond this he describes a 
number of gambits to improve client rela- 
tionships and outlines an equally active pro- 
gram to promote intra- and extra-institutional 
staff relationships. Counseling administra- 
tors will find a number of suggestions for per- 
formance of a function rarely discussed in 
print. Unfriendly voices will perhaps find 
evidence to reinforce their suspicions of “em- 
pire-building.” 

Harold Pepinsky’s concluding paper argues 
the thesis that the counseling psychologist 
“can help to build a culture in which the in- 
dividual members are able to communicate 
with each other, to respond positively to each 
other, and to work together toward common 
group objectives,” and provides a rationale 
for the use of group procedures in pursuit of 
this aim. Such activities appear to be geared 
to reaching a much larger proportion of the 
student body in the interests of community- 
wide mental health. 

On a limited sector, the 
sity campus, these papers 
nomenon of our times—the 
cluding growing pains) of 
groups eager to provide exp 
helping man to cope with the “human pre- 
dicament.” Tt is understandable that coun- 


seling psychologists are zealous and ambiti- 
ous; the need is great. 


college or univer- 
exemplify a phe- 
development (in- 
new professional 
eriences aimed at 


Arthur H, Brayfield 
Kansas State College 


Bross, Irwin D. J. Design for decision, New 
York: Macmillan, 1953, Pp. viii + 276. 


$4.25. 


Every so often a book arrives for me to 
review that I find interesting and exciting, 
Bross’s Design for decision is one of the few 
that falls into this category. It was read 
hastily once with enthusiasm and unflagging 
interest and almost without interruption of 
any sort. I could hardly wait to finish one 


Book Reviews 


chapter so that I could move on to the n 
During the weeks that followed the first E 
ing, I picked up the book many times to T d 
various sections at a more leisurely pace E 
my enthusiasm for it has not diminished 
any noticeable extent. Hero? 
It may be granted that the book Eae, 
duces nothing that is not the common T 
edge of all modern statisticians. But it few. 
say what it has to say in a manner that been 
if any, other modern statisticians have d he 
able to say it in. Bross can write an 
writes very well indeed. a that 1 
Do not let that word “statisticians book 
used above mislead you. This is not a ou 
about statistics in the sense in which yee 
may interpret that word. It is not, for tive 
ample, a collection of formulas, Lou 
calculations, mathematical derivations tics 
Proofs. As the author states, no eee 
is required for reading it other than pi 
school algebra. As a matter of fact, aes 
you have forgotten your high school mare j 
you'll still get along with the text pretty king 
Rather, this is a book about decision oe ii 
or, more precisely, about statistical decisi g 
What is statistical decision? You may fir 
some indication of what statistical decision ter 
volves from the following listing of R ie 
headings: history of decision, nature O ules 
cision, prediction, probability, values, ™ se- 
for action, Operating a decision-maker g, 
quential decision, data, models, samp tica 
measurement, statistical inference, statis 
techniques, design for decision. nade 
My answer, although admittedly O 
quate, is that statistica] decision is a me in 
for making decisions that has its origins a 
a variety of specialized fields. I might iion 
go so far as to identify statistical dec! not 
with scientific method, though Bross may est 
agree with this viewpoint, Anyway, the 
answer as to what Statistical decision 1S Cal j 
obtained by reading Bross’s book for Your” y 
I should add that there is something. 
this book for everyone. If you have E tell 
tistical training, Design for decision W! cae 
you how a modern decision-maker oper® tic! 
without overwhelming you with mathem?! ip 
details. If you have some experience 
plied statistics, then, tad 
may find that this book “provides a V4", pe 
point from which it is possible to see all ° 


t, 9% 
as Bross points 0U'y age 


ne 
i 


Book Reviews 


A techniques in their proper perspec- 
e. 
> “Some readers may be intrigued by the 
ideas of Statistical Decision because they rep- 
ey a new advance toward the solution of a 
a human problem. The principles have a 
fóre scope; they apply to the choice of a 
Se “i policy or to the private decisions that 
hilo must make. They are, if you like, 
th osophical principles, a way of looking at 
the world in which we live, a guide to action 
in that world.” 
A the above paragraph from the introduc- 
ağ y Pliapler of Bross’s book doesn’t whet your 
MEN ite and stir you to march out to your 

arest library or bookseller for a copy of 
E for decision, then you are a lost cause 
thi no additional words of praise of mine for 

S book are going to help. 

Allen L. Edwards 


The University of Washington 


Remmers, H.H. Introduction to opinion and 
ina measurement. New York: Harper 
Bros., 1954. Pp. 437. $5.00. 


Prepared as a college textbook, this volume 
lec H. H. Remmers, professor of psychol- 
Ce director of the Division of Educa- 
a pan eference at Purdue University, offers 
attit oranie view of the field of opinion and 

e. e measurement, with emphasis divided 

ween method and application. 
mes realization is rapidly growing,” the 
vidual says, “that attitudes, the way indi- 
Pects s and groups feel about the various as- 
ermi of their world, are probably more de- 
und mative of behavior than mere cognitive 

€rstanding of this world.” 
i I is devoted to a. discussion of tech- 

co S, including sampling and statistical 
the ry » Scaling, single question evaluation, 
the apa naten questionnaire,” and some of 
lectiy, ess direct measures of attitudes”—pro- 
ing Ve methods, sociometric approaches, rat- 

T ales, and the concept of empathy. 

t ane II, Dr. Remmers describes many 
ion € Varied uses to which attitude and opin- 
indugqe surements have been put in business, 
munit? the government, the study of com- 

The. interrelations, and education. , 
and; book is fairly comprehensive, succinct, 

Ppe in the main—a readable presentation. 

Nded to each of its dozen chapters are a 


377 


brief critical summary, a list of questions, 
and a bibliography. 

The chapter on scaling techniques contains 
an able exposition of the Thurstone and 
Likert contributions, moves on to scale analy- 
sis as developed by Guttman, and deals in 
some detail with “the Cornell technique,” be- 
cause, says Dr. Remmers, it appears to be the 
one “most likely to be feasible in the greatest 
variety of situations.” 

His description of personality, interest, and 
problem inventories is of equal merit. He 
offers a quite lengthy report on the procedures 
that were followed in developing the Science 
Research Associates’ Youth Inventory, under 
auspices of the Purdue Opinion Panel, with 
which the author is identified. 

Dr. Remmers treats extensively of the ap- 
plications of attitude and opinion measure- 
ments by educators. He reviews also the 
utilization of similar methods by the busi- 
nessman to improve his advertising programs 
and his products; by industry in the study of 
employee attitudes, plant morale, absentee- 
ism, and workers’ opinions of members of 
minority groups; by social researchers in the 
analysis of intergroup and interpersonal re- 
lationships. 

Unhappily, the book appears to lack fresh- 
ness. Some of the material is obviously 
“dated”; one gains the impression that the 
author, except in a few instances, stopped 
collecting data along about 1947 or 1948, 
though much that is worth while has appeared 
in the literature since then. To illustrate 
Census Bureau sampling, he describes what 
was done in the 1940 census; to describe the 
government’s uses of attitude and opinion 
studies, he dwells on World War II opera- 
tions: to indicate the scope and nature of the 
Survey of Consumer Finances, he discusses 
what was done in the first year, 1946. With 
a mild apology for its absence, the author 
omits any material on how the television in- 
dustry has put social science research to work. 

The implication conveyed by the word “In- 
troduction” in the title, that this is a textbook 
for beginners, may be somewhat misleading; 
it quickly becomes apparent that the college 
student will find himself in deep water unless 
he has been forearmed with preliminary work 
in statistics and psychology. 

Notwithstanding, the volume is a scholarly 


378 


and well-planned treatise. In writing it, Dr. 
Remmers has made a substantial contribu- 
tion toward effecting the kind of “popular 
understanding of the importance and impli- 
cations” of the findings of the social scien- 
tists which he, at the outset, urges. The 
book is one of Harper’s ne Series, 
zhi ner Murphy is editor. 
Me A E Sidney S. Goldish 


Minneapolis Star and Tribune 


Sherif, Muzafer and Wilson, M. O. (eds.). 
Group relations at the crossroads. New 
York: Harper, 1953. Pp. viii + 379. $4.00. 
Like the preceding Social Psychology at 

the Crossroads, this volume is a collection of 

papers emphasizing social-psychological con- 
cepts and explanations, prepared for a con- 
ference at the University of Oklahoma (April, 

1952). 

t summary 

introduction. Next comes J. P. Scott’s “Im- 

Behavior for 

This is one 

ews of the 

es the con- 
to describe 
behavior. 

sychological Traits and 

Anne Anastasi traces in 


ous traits and gro’ 
_ Anselm Strauss’ “Concepts, Communica- 
tion and Groups” discusses the primary im- 


portance of language in the development of 
human social behavior, 


ption and Psy- 
chology of Perceptual Learning” is an out- 
line of the process of perceptual learning in 
terms of generalization and differentiation, 
Gardner Murphy’s “Knowns and Un- 
knowns in the Dynamics of Social Percep- 
tion” considers the importance of differential 
group membership on differential perception, 


the lines of cleavage between groups, and 
their significance, 


“Development of the 
search Movement” 
mainly sociological 


Small-Group Re- 
by R. E. L. Faris covers 
studies in the area, 


Book Reviews 


Herbert Blumer’s “Psychological Impan K 
the Human Group” is a reiteration of E 
need to study both the group situation na 
the individual in developing an aaen a 
theory of human interaction. An attemp ne 
made to spell out some significant eleme 
of group behavior. “tere 

Sheri’ Paper on reference groups re 
ates the importance of reference group. A 
distinct from membership groups in un 
standing social behavior. Com- 

Leon Festinger’s “An Analysis of én 
pliant Behavior” is a discussion of the ub- 
pirical validity of two hypotheses: (1) Pol 
lic compliance without private acceptance om 
occur if the person in question remains £ nt; 
pliant and in the group to avoid punisis K 
(2) public compliance with private acc r 
ance will occur where it is satisfying tO 
main with those influencing the eg a 

Launor Carter’s “Leadership and ol 
Group Behavior” is primarily a simmar of 
his experimental studies on the penata 
leaders, the generality of leadership, an 
effects of the group on leader behavior. de 

The need to consider social distance’ dices 
tively, and to differentiate it from preju 
is the theme of Mozell Hill’s paper. apanle 

Nelson Foote and Clyde Hart’s a to! 
Opinion and Culture Behavior” point oll 
(1) the dangers of depending only on Phe 
answers to gauge public opinion; (2) in 
Possibilities of analyzing public behavi0 
order to assess public opinion. vey 

Helen Jennings concludes with a A < 
combining conclusions from sociometric $ ex 
ies with, as yet, unpublished sociodramatie | 
amples to point out their significance ae 
derstanding Personality and group forma and 

Despite the heterogeneity of aims. wel 
methods of each of the papers, the rer 
will hazard presenting some overall imp 
sions of the book. 

1. While such 
perceptual devel 
contributions fy, 
search 


es* 


tan Do 
general psychology topicod, 
opment have been mer re- 
om many of the larges! „s 
programs on group relations SUC” pe 


pology is also absent. pee” 
2. Some of the papers could have -ep 
more of a contribution had they more & ew 
sively surveyed the literature. Yet, 1” 


| 


Book Reviews 


eral, most papers tended to maintain a high 
standard of excellence. 
s 3. Social psychologists seem to be trying 
hard to adopt sociological concepts and to 
integrate their work via “interdisciplinary” 
research with the other social sciences. It 
ae ee both more parsimonious and profit- 
Tse or social psychologists to integrate their 
abe with the general psychology of 
fom eet perception and motivation. This 
s not mean that rejecting many sociologi- 
of re will lead to ignoring the nature 
eta stimulating situation while studying 
F behavior. Rather, the situation will 
inal in terms which will lend them- 
iba ee readily to integration with psycho- 
lee concepts describing the other equally 
i rtant determinants of social, and all be- 
tio or, Le, the behavioral history, motiva- 

n and biological level of the behaving or- 
Sanisms. 

Bernard M. Bass 


Louisiana State University 


Schlotter, Bertha E. and Svendsen, Margaret. 
me experiment in recreation with the men- 
fi y retarded. (Rev. ed.) State of Illi- 
PEN Department of Public Welfare. Pub- 
N by National Mental Health Funds, 

1. Pp. 142. Gratis. 

Ren B book is a re-issue of the volume pub- 
bn in 1932. Additions are a new introduc- 
ub by the director of the Department of 

lic Welfare, and a thirteen page preface 
oups ertha E. Schlotter which provides an 
teatis iew of the continuing effects of the rec- 
nma program begun in 1929. Requests 
earl; other institutions for copies of the 
ier publication led to this re-issue. 

Sea, he Ilinois Institute for Juvenile Re- 

tch, in a survey of the recreational pro- 


Stam at the Lincoln State School and Colony 


in o 
i 1929, reported institutional overcrowding, 


i 
i ea facilities and staff, poor use of 
Quiet jes, and overemphasis on maintaining 
videq and order. Recreational activities pro- 
the 2 o active participation by only 100 of 
tablish; 00 patients then under care. The es- 
a one ment of a department of recreation and 
Bram -year experiment with a recreation pro- 
This on an institution-wide basis followed. 
book discusses staff qualifications, mM- 


379 


service training programs, grouping of pa- 
tients, and equipment and space problems. 
Specific lists of equipment, musical selections, 
books, and activities are included, with com- 
ments concerning their use and modification. 
About half of the book is devoted to “socio- 
psychological analysis of play activities.” 
This section classifies activities in several 
ways: alphabetically, with minimum MA 
indicated; grouped for several MA levels; 
according to the degree of motor activity; 
according to need for equipment; and ac- 
cording to type of social organization and 
participation. 

There are some important values of the 
book: the inclusion of lists of source books 
for games, songs, activities, and dances is of 
special interest to the recreational worker; 
the beginning worker will benefit by the 
vicarious experience made available. There 
are useful suggestions for modifying activi- 
ties to suit special needs, and “leads” as to 
the handling of difficult patients. There is a 
real exemplification of the wide practical im- 
plications of the concept of individual differ- 
ences in work with defectives. 

From the viewpoint of this reviewer, it is 
unfortunate that the author attempted in the 
preface to defend the program in terms of 
psychological “principles” which are often in- 
consistent and contradictory, and sometimes 
not principles at all. The psychologically un- 
sophisticated reader might be over-impressed 
by the comment, “This belies the belief that 
punishment, drill, and rewards are justified 
in the treatment of mental defectives” (p- 
12). Comments concerning the level of per- 
formance attained would also be misleading 
to the neophyte in the field of mental de- 
ficiency: ie., “In their dancing they show 
skill, variety, imagination, and spontaneity” 
(p. 17). Without at least a reminder to the 
reader of relative standards of expectation, 
such statements are potentially dangerous. 

A recreation worker interested in the men- 
tally deficient should study this book, but 
should maintain a cautious attitude toward 
the generalizations while embracing the spe- 
cific helpful suggestions and making full use 


of the source material. 
Harriet E. Blodgett 


Institute of Child Welfare, 
University of Minnesota 


New Books, Monographs, and Pamphlets 


Books, monograp! i i G. Darley; 
isti ble review should be sent to Dr. John 
iS, ographs, and pamphlets for listing and possil ] s - 
7° oe Bellon sect Graduate School, University of Minnesota, Minneapolis 14, Minnesota. 


Intelligence. L. J. Bischof. New York: 
Doubleday and Company, Inc., 1954. Pp. 
33. $.85. 


Fundamenials of psychoanalytic technique. 
Trygve Braatoy. New York: John Wiley 
& Sons, Inc., 1954. Pp. 404. $6.00. 

Introduction to psychiatry. O. Spurgeon 
English and Stuart M. Finch. New York: 
W. W. Norton & Company, Inc., 1954. 
Pp. 621. $7.00. 

Nature and nurture: A modern synthesis. 
John L. Fuller. New York: Doubleday 
and Company, Inc., 1954. Pp. 40. $.85. 

Methods of research. Carter V. Good and 
Douglas E. Scates. New York: Appleton- 
Century-Crofts, Inc., 1954, Pp. 920. 
$6.00. 

Community and environment, E. A. Gutkind. 
New York: Philosophical Library, 1954. 
Pp. 81. $3.75. 

Social planning in America. 
New York: Doubleda 
1954. Pp. 59. $.95. 

The deaf and their problems. 
Hodgson. New York: Philosophical Li- 
brary, 1954. Pp. 364. $6.00, 

Guidance services. J. Anthony Humphreys 
and Arthur E. Traxler, Chicago: Science 
Research Associates, Inc., 1954. Pp. 438. 

Perception. William H. Ittelson and Hadley 
Cantril. New York: Doubleday and Com- 


Joseph S. Himes. 
y and Company, Inc., 


Kenneth W. 


pany, Inc., 1954, Pp. 33. $.85. 

Social psychology. Revised Edition. Otto 
Klineberg. New York: Henry Holt and 
Company, 1954, Pp. 578. $5.25, 

The regulation of businessmen, Robert E, 
Lane. New Haven, Conn.: 


Yale Univer- 
sity Press, 1954. Pp, 144, $3.75, 


Job evaluation methods. Second Edition, 
Charles Walter Lytle. New York: Ronald 
Press Company, 1954, Pp. 507. $7.50, 

Teaching tips: A guidebook for the beginning 
college teacher. Second Edition. Wilbert 
McKeachie and Gregory Kimble, Ann 
Arbor: The George Wahr Publishing Cos 
1953. Pp. 108. $1.50. 


380 


The origins and history of consciousne 
Erich Neumann. New York: ae 
Series, 140 East 62nd Street, 1954. PP: 
493. $5.00. wild 

Psychoanalysis and the education of the ¢ W. 
Gerald H. J. Pearson. New York: Wh 
Norton & Company, Inc., 1954. Pp- 
$5.00. a 

Developing management ability. Dal 
Planty and J. Thomas Freeston. Pp. 
York: Ronald Press Company, 1954. 
447. $3.75. n 

Measurement in today’s schools. Third Pa 
tion. C. C. Ross and Julian C. Wo 
New York: Prentice-Hall, Inc., 1954. 
485. $6.65. 

The clinical interaction. Seymour B. 
son. New York: Harper & Brothers, 
Pp. 425. $5.00. svat 

Problems of infancy and childhood. ae 
J. E. Senn, Editor. New York: The Je 
Macy, Jr., Foundation, 1954. Pp- 
$2.75. 

Social science in medicine. Leo W. Simmo 
and Harold G. Wolff. New York: RO 
Sage Foundation, 1954, Pp. 254. $ ith 

Psychology in teaching. Henry P. Mo; 
New York: Prentice-Hall, Inc., 1954- 

466. $4.95. t 

Strengthening education at all levels. Arthi, 
E. Traxler, Editor. Washington, Di agh 
American Council on Education, 

Pp. 156. $1.50. 

Techniques o 


Sara 
1954 


Z rters 
f counseling. 


Jane Wa 


anys 

New York: McGraw-Hill Book Comp? 
1954. Pp. 384. $4.75. tiom 

Experimental psychology. Revised po 


P 
Robert S, Woodworth and Harold ne 
berg. New York: Henry Holt and 
pany, 1954. Pp. 948. $8.95. 

Learning theory, 
clinical research, 
posium. New 
Inc., 


d 
n 
personality theory» n 

The Kentucky son’ 
York: John Wiley & 
1954. Pp. 164. °$3.50, 


Journal of Applied Psychology 


| VoL. 38, No. 6 


DECEMBER, 1954 


The Relationship between Mechanical Aptitude and Proficiency 
Tests for Air Force Mechanics 


Major Thomas L. Wood 
Standards Branch, Hq. USAF, Washington, D. C. 


Ae September, 1952, the United States Air 
in ce installed a world wide proficiency test- 
‘ci oe to assist classification officers in 
hi i, ing airmen qualified for advancement to 
i er skill levels. The program included use 
hp tests, custom built by professional 
Ai technicians, to cover over 200 specific 
ir Force jobs. 
at evelopment of the tests was carried out 
(W. Headquarters Air Materiel Command 
right-Patterson AFB), Headquarters Air 
Artie) Command (Scott AFB), and Head- 
oy Continental Air Command (Mitchel 
shoe. Special units, under the direction of 
el rs with psychological training, were or- 
ag to write the tests, score the answer 
anal s and provide continuous statistical 
test ysis to improve successive forms of each 
. Subject matter specialists were selected 
‘Neg major field commands of the Air Force 
in m master sergeants with wide experience 
or supervising the specific jobs for which 
es, tests were built. On the average, five 
ove sergeants worked with a professional 
ye development technician in writing each 
re Specialty descriptions of the Airman 
= eer Program were used as guides in build- 
8 test outlines and in weighting the task 


- Clements of each job. 


Problem 


Several Air Force studies in the past have 
Sy n the relationship between aptitude 
Cie i and success in training to be signifi- 

(3, 4). Brown and Ghiselli in a sum- 


ated into the 


1 
dag Pese three units were consolid: x Ma 
ong Island, 


o 
N Oth Test Squadron, Mitchel AFB, 
» in April 1953. 


mary of the findings of research studies since 
1919 concerning the predictive power of tests 
of intelligence, speed of perception, and spa- 
tial and motor aptitudes found aptitude tests 
to be very useful in predicting training suc- 
cess (1). However, when the aptitude tests 
are related to job proficiency measures such 
as speed and amount of production, achieve- 
ment tests, and supervisor’s ratings the pre- 
dictive power drops considerably. 

Since the Air Force had never used job 
proficiency tests as qualification standards be- 
fore 1952, it was considered necessary to de- 
termine the relationship between the new 
proficiency tests developed by airman spe- 
cialists and the Airman Classification Bat- 
tery (ACB) (2). Aptitude scores from the 
ACB are used at Air Force Military Train- 
ing Wings to determine the initial classifica- 
tion and assignment of airmen to technical 
training courses. 

Rationale 


Since the proficiency tests were being used 
atory job knowledge mini- 
award of higher skills, 
they were considered to be a practical cri- 
terion of knowledge needed to be successful 
on the job. An airman who fails to acquire 
the required minimum knowledge of his job 
is restricted in skill advancement and, conse- 
quently, in promotion to higher rank. Pass- 
ing the appropriate proficiency test then be- 
comes one objective index of success on the 
job. 
i During September, 1952, 9,234 airmen were 
tested on the senior aircraft mechanic’s test. 
A random sample of 461 cases was selected 


to measure mand 
mums requisite to 


381 


382 


to study the relationship between aptitude 
scores and proficiency scores. The data were 
divided into four cells on the basis of pass- 
fail ? on the proficiency test and qualified-not 
qualified on the mechanical aptitude test. 

It will be noted in Table 1 that 36 (8%) 
of the airmen failed to attain a qualifying 
score on the aptitude test, while 115 (25%) 
of the airmen tested failed the proficiency 
test. Of those failing the aptitude test, 33 
(92%) also failed the mechanic’s proficiency 
test. 


Table 1 


Relationship between Performance of Aircraft 
Mechanics on Mechanical Aptitude 
and Proficiency Tests 


Proficiency Test 


Failed Passed Total 
Passed 82 343 425 
m (18%) (74%) (92%) 
3a 
E 2 Failed 33 3 36 
pE (7%) (1%) (8%) 
SS 
a > 
™ Total 115 346 Tp = .61* 
(25%) (75%) 


sence Air Force sample, N = 461, tested Septem- 


* Significant at the 1% level of confidence, 


The Pearson r between the two tests was 
found to be -61, significant at the 1% level 
of confidence, 

From an Air Force Population of 2,426 air- 
men tested in November, 1952 on the senior 
vehicle mechanics’ Proficiency test, a random 
sample of 303 airmen was selected. 

As indicated in Table 2, 23 of 30 (59%) 
airmen who were below standards in me- 
chanical aptitude passed the Proficiency test 
while only 16 (41%) failed the Proficiency 
exam. In this case it is obvious that me- 


2 Passing on the Proficiency Tests was established 
at a Standard Score of 80, based on the total Air 
Force population tested (Standard Score distribu- 
tion mean 100, std. dev. 20). Aptitude minimum 
scores are established for each Career Field as a re- 
sult of research conducted by the Human Resources 
Research Center to determine aptitude scores predic- 
tive of success in technical training courses (5), 


Major Thomas L. Wood 


Table 2 


Relationship between Performance of Vehicle 
Mechanics on Mechanical Aptitude 
and Proficiency Tests 


Proficiency Test 


Failed Passed Total 
Passed 26 238 264 
-g O%) (19%) (88%) 
Re 
Boia 
Sg Failed 16 23 39 
4 2%) 
32 (5%) O% a 
aa * 
= Total 42 261 rp = 35 
(14%) (86%) 


Source: Air Force sample, N = 303, tested Novem- 
ber 1952. 


* Significant at the 1% level of confidence. 


chanical aptitude is not so highly related t0 
the ability to pass a proficiency test custom 
built to fit the job. The Pearson r in thi 
case was .35, significant at the 1% level © 
confidence. 

A further random sample of 189 airmen 
was drawn from 1,079 senior weapons Me 
chanics tested in November 1952. Table 5 
shows that 18 men (10%) were below whe 
minimum in mechanical aptitude. Of thes? 
only seven (39%) passed the proficiency test, 


Table 3 


Relationship between Performance of Weapons 
Mechanics on Mechanical Aptitude 
and Proficiency Tests 


Proficiency Test 


Failed Passed Total 
Passed 36 135 171 ) 
ai (19%) (71%) (90% 
2h 
22 Failed 18 
gS 11 7 
EE 6%) a% uo% 
Q ta 
i Total 47 142 aa 35 
(25%) (15%) 
A m 
ie Air Force sample, N = 189, tested Nove 


* Significant at the 1% level of confidence. 


p" 


| 


Relationship between Mechanical Aptitude and Proficiency Tests 


while 11 (61%) failed on the proficiency 
score. The Pearson r was .35, significant at 
the 1% level of confidence. 


Summary and Conclusions 


Table 4 summarizes the data concerning 
aircraft mechanics, vehicle mechanics, and 
weapons mechanics. When each group is di- 
vided into a dichotomy of high aptitude and 
low aptitude, it can be readily seen that the 
failure rates for the low aptitude men are 
much higher on the appropriate proficiency 
test. 


Table 4 


Failure Rates on Proficiency Tests for High 
and Low Aptitude Mechanics 


Total N Fail 

Group Aptitude N Failed Rate 
Aircraft High 425 82 19% 
Mechanics Low 36 33.92% 
Vehicle High 264 2% 10% 
Mechanics Low 39 16 u% 
Weapons High 171 36 2a% 
Mechanics Low 18 11 61% 
Total High 860 14 è 17% 
. Low 93 60 65% 


b Source: Air Force sample, N = 953, tested Septem- 
er and November 1952. 


383 


Since not all airmen attend technical schools 
where they can be evaluated shortly after 
they receive aptitude tests, it is important for 
the Air Force to have aptitude scores which 
have value in predicting not only success in 
training but also relative performance after 
experience on the job. 

From the data presented it would seem 
that present mechanical aptitude tests pre- 
dict future assimilation of job knowledge to 
a usable degree. 


Received November 9, 1953. 


References 


. Brown, C. W. and Ghiselli, E. E. The relation- 
ship between the predictive power of aptitude 
tests for trainability and for job proficiency. 
J. appl. Psychol., 1952, 36, 370-372. 

. Gordon, Mary Agnes. Validity of the Airman 
Classification Battery (AC-1A) for Career 
Guided Samples. Research Note Pers 51-13. 
Human Resources Research Center, Lackland 
Air Force Base, Texas, July 1951. 

. Gordon, Mary Agnes. A Method of Establishing 
Minimum Qualifying Scores for Entrance to 
Air Force Technical Schools. Technical Re- 
port 52-4. Human Resources Research Cen- 
ter, Lackland Air Force Base, Texas, Novem- 
ber 1952. 

. Gragg, D. B. and Gordon, Mary Agnes. Validity 
of the Airman Classification-Battery AC-1, 
Research Bulletin 50-3, 2nd edition, Septem- 
ber 1951. Human Resources Research Cen- 
ter, Lackland Air Force Base, Texas. 

. Personnel Evaluation Manual, Air Force Manual 
35-8, 1 July 1953, Department of the Air 
Force, Washington 25, D.-C: 


m 


N 


w 


> 


an 


THE JOURNAL or APPLIED PsycHOLOcy 
Vol. k No. 6, 1954 


A Comparative Evaluation of Two Approaches to Job-Knowl- 
edge Test Construction? 


Harry M. Mason? 


University of Illinois 


Writers concerned with construction of per- 
sonnel tests (1, 2, 3, 4, 7) advise somewhat 
different procedures for selecting and editing 
job-knowledge or trade test contents prior to 
tryout, but evidence that alternative ap- 
proaches result in tests having different rela- 
tionships to criteria is lacking. Experience 
in editing test content assembled by teams of 
expert workers under the guidance of test 
technicians suggested to the writer that two 
general approaches to the item writing task 
are used, one of which may be called the job- 
requirements approach and the other the job- 
experience approach. It seemed likely that 
the test items produced through these two 
approaches would exhibit corresponding dif- 
ferences in their relationships to criteria of 
job success. 

The present study reports an empirical try- 
out of three newly constructed tests of job 
knowledge applicable to airplane and engine 
mechanics maintaining piston-engined air- 
craft, and three existing tests from the Air- 
man Classification Battery. Two of the new 
tests were constructed in accordance with the 
job-experience approach to test co 
and one in accordance with the į 
ments approach. 


Results 
show the degree to 


which tests assigned to each approach are 


33(038)-25726, monitored by the Huma: 
Research Center, Permission is grante: 
duction, translation, publication, 
whole and in part by or for the 


n Resources 
d for repro- 
use and disposal in 
United States Gov- 


2 The writer wishes to acknowled 


ge help and criti- 
cism given by Prof. L, H. Lanier. 


384 


consistent in their relationship to criteria, and 
differences between the patterns of relation- 
ships to criteria characteristic of tests aS 
signed to the two approaches. 


The Two Approaches 


The job-requirements approach strives to ne 
ure mastery of formally stated job requiren T 
Tests resulting from this approach are esen ua 
examinations over training courses, job handboo i 
and similar materials. The job-requirements w 
proach seems likely to dominate production i 
test items whenever rapid production or E 
of tests is required, since it allows the item w ere 
to capitalize upon the organization already ex!§ 
ing in published materials, uc- 
The job-experience approach to test conii 
tion attempts to measure mastery of one or a 
topics representing distinctive learning opportu to 
ties afforded by a job. Test items may relate to 
what the worker does, to what he may cate ne 
happen on the job, or to knowledge resul as 
from job aspects having no suspected impu 
ance. Since this approach requires the test ae 
structor to select and Organize learning OPE 
tunities offered by the job, it is difficult to U5 
when tests must be Produced quickly. Sa jae 

The two approaches differ in underlying 


; a e 
sumptions. The job-requirements approach, i 
Sumes that workers differ in the degree to W Aa 
they meet “minimum” f 


i i ts. A 
job requirements, is 
that any excess over the minimum level of th 


, r 
type of knowledge results in enhanced job pi 
formance. The job-experience approach assu ibit 
that all workers retained on the job exh 


b Pa P hat the 
aove-minimum job knowledge, but tha b is 
quality of the w 


best reflected j 
Opportunities t 


a 

ons upon entry to the job ind 
sts 

nt completely relevant, tere 


idea 
In anything less than the ideni 


experience 
tered, 


Approaches to Job-Knowledge Test Construction 


Requirements Centered Tests. Alternate forms 
of the Electrical Information and Mechanical 
Principles Tests of the Airman Classification 
Battery were used.3 They were assigned to the 
requirements-centered category because their con- 
tent and the descriptions given by Gragg and 
Gordon (5) indicate that they are concerned 
with principles taught in schools which prepare 
men to meet requirements for entry into me- 
chanical jobs. The Electrical Information Test 
Contains 30 four-choice items concerned with 
Circuit diagrams and electrical principles. The 

echanical Principles Test presents 15 picture- 
type items relating to machines encountered in 
everyday life. 

The Training Research Laboratory (TRL) 
Aviation Mechanics Technical Knowledge Test 
Was constructed for the present study. It con- 
Sists of 60 four-choice items chosen from data 
of a long-range study employing 300 job-knowl- 
edge test items obtained from Air Force and 

avy aviation mechanics schools.* Items se- 
lected were the best discriminators between air- 
Man trainees in early and late phases of tech- 
nical schools which are prerequisites for entry 
into the apprentice level of the Air Force job of 
Airplane and Engine Mechanic. Most of the 
items relate to the operation or malfunctioning 
ot aircraft components. 

xperience Centered Tests. An alternate form 
of the Aviation Information Test of the Airman 
lassification Battery was assigned to the experi- 
€nce-centered category. It is intended to meas- 
we inductees’ attempts to gain contact with 
aviation work, The test contains 30 four-choice 
items, 
st The TRL Aviation Information Test was con- 

Tucted from information contained in Jane (6). 
č has 30 five-choice items relating to relative 
Tuising speeds and other performance charac- 
apustics of well known civilian and military 
e. Planes, names of manufacturers of airplane 
eauipment, and the equipment and components 
tuPloyed in different airplanes. Its content is 1n- 
Ended to be continuous in type with that in the 
ti rman Classification Battery Aviation Informa- 
aye Test; it refers to information more readily 
si ailable to aviation workers than to men out- 

T aviation jobs. 3 
sipe two aviation information tests were as- 
Saed to the experience-centered category be- 
p use their content may be learned through ex- 
to uence on the job, rather than through attempts 

meet formal job requirements through schools. 

* Air Force tests were made available by the Per- 
esearch Laboratory, Human Resources Re- 
E enter, Lackland Air Force Base. Permission 

aMPloy these tests is gratefully acknowledged, É 
Under «Study is being conducted by Dr. E. a ma 
Knowle e present Air Force contract. Gra ae ae 
items, gment is made for permission to us 


385 


The TRL Maintenance Techniques Test was 
made up as a result of interviews with 50 airmen 
recommended as expert airplane and engine me- 
chanics. Each interview was guided to cover all 
major airplane systems on the aircraft with which 
the interviewee was most familiar. Statements 
of interviewees led directly to test item content, 
or to intuitive guesses concerning the verbal self- 
guidance employed by good mechanics in the 
tasks mentioned. The test contains 76 four- 
choice items. It emphasizes airframe rather than 
powerplant systems. Most of the items relate to 
operations performed by the mechanic, rather 
than to the mechanical operation of aircraft 
components. 

Administration of Tests. Slightly less than 
four hours were required for subjects to answer 
the entire battery of six tests and to give per- 
sonal information and peer ratings. Air Force 
tests were finished within time limits and were 
scored according to prescribed formulae; new 
tests were given without time limits and were 
scored for number of correct responses. 

Subjects. Subjects were 204 airplane and en- 
gine mechanics chosen as every n-th name from 
alphabetical rosters of all airplane and engine 
mechanics at apprentice to supervisor or tech- 
nician levels in three airplane maintenance squad- 
rons at Lowry Air Force Base. In sampling, 7 
was chosen as every third or every fourth man, 
to give as nearly 70 men per squadron as possible. 

Criteria. Personal data statements concerning 
the length of aircraft maintenance experience 
were the principal criteria employed. In treat- 
ment of results, men claiming six years or more 
of experience are called high-experience men, 
those with less than six years of experience are 
called low-experience men. The distribution of 
experience, with a major mode at less than two 
years and a minor mode at more than eight years 
made a breakdown at this point convenient. Di- 
vision at this point is also in accord with the 
presumption that the low-experience group is 
composed primarily of men who have not yet had 
time to become fully competent, and that the 
high-experience men have, in general, met what- 
ever effective minimum job requirements exist. 
High-experience men averaged 10.4 years, low- 
experience men 1.7 years of experience. 

Peer ratings were also employed. Each me- 
chanic ranked for competence the six men whose 
work habits he knew most thoroughly. Men 
ranked 3.0 or less, on the average, by four or 
more peers are called “Good”; those ranked 3.1 
or more on the average are called “Poor”; men 
not ranked by as many as four peers are re- 
garded as “Not Rated.” After inspection of test 
results it was seen that at each experience level, 
the subgroup rated Poor was different from 
others, but that subgroups rated Good or Not 
Rated had essentially the same mean scores on 
all tests, there being no significant mean differ- 


386 Harry M. 


Mason 


Table 1 


Mean Scores of Airplane and Engine Mechanic Criterion Subgroups 


Criterion Subgroup* 


Mabait  HighEsper.  Low-Exper. Low Epes 
Test (N=5) (N = 50) (N= 51) (N = 98) 
Experience Centered: : 
TRL Aviation Information** 14.1 20.7 16.2 15.3 
Air Force Aviation Information** 19.3 24.2 21.6 21.1 
TRL Maintenance Techniques** 38.9 46.2 41.1 39.4 
Requirements Centered: 
TRL Technical Knowledge*** 26.1 34.5 30.2 34.6 
Air Force Electrical Information 16.1 21.0 19.9 20.7 
Air Force Mechanical Principles 10.8 11.0 11.4 114 


* Criteria employed were 
used to establish subgroups. 
** Test separates High-experience “other” 
difference (1 per cent level). 
*** Test separates either “Poor” 


group from 
(1 per cent level). 


all three 


ences. Consequently, in treatment 
subgroup rated Poor is differentiated from one 
designated as Others at each experience level, 
The presumption that the high-experience men 
are the more competent 
that they were at least 
experience at the outsets 
the low-experience men, 


0 N 
half = the low-experience men had had all Ferd 
maintenance experience on one t: i 
Nene of ih 0 ype of airplane. 


One eleventh of the high- i 
one-third of the low-experi 
Poor in the peer ratings. 

Aptitude Indexes were available for only 117 
men, all in the low-experience category. The 
mean Mechanical Aptitude Index was 6.8, SD 
1.33; the mean Technician Specialty Index reine 
parable to a group intelligence test score, was 
6.4, SD 1.88. Neither of these shows a high de- 
gree of selection, since Gragg and Gordon (5) 
found means for both Indexes to be 6.2 in 1,000 
presumably unselected inductees in 1949. Air 
Force Specialty Codes assigned the men ranged 


amount of aircraft maintenance experience 


1 
and peer ratings. Sce text for method 


subgroup from all three remaining groups by a highly significant 


io A 5 
Femaining subgroups by highly significant difference 


from the “3” to the “7” level, but these were 
employed as criteria, since they had not all bee 
assigned through the present uniform procedure, 

mong the low-experience men, the peer Ta 
of Poor does not appear to be associated W! 
low Mechanical Aptitude, the proportions 0! | 
mechanics rated Poor and 81 rated Other having 
Mechanical Aptitude Indexes of 7 or higher D& 
ing 53 and 55 per cent, respectively. The a 
Portion of Poor having Technician Specialty I" 
dexes of 7 or higher was 33 per cent, while ag» 
for Others was 48 per cent. This difference ? 
nearly significant, 


Results 


Preliminary studies had shown that all a 
distinguish Significantly between groups of in- 
ductees and the working mechanic group: 
In the present study, means of criterion 
subgroups are compared to determine (4 
whether or not a common pattern of SU?” 
group mean differences characterizes the test 
assigned to each major approach; and 

5 In preliminary studies, 108 airman inductees Ta 


airplane a at 
end of ne e level 


7 — 


Approaches to Job-Knowledge Test Construction 387 


whether or not the pattern of criterion sub- 
group means produced by tests assigned to 
one approach differs meaningfully from the 
pattern produced by the tests assigned to the 
other. Brief consideration is given to cor- 
relations between tests, and between tests and 
Aptitude Indexes. 

As shown in Table 1, all of the experience- 
centered tests show the same rank order of 
subgroup means. High-experience mechan- 
ics rated “Other” have highest scores, fol- 
lowed by low-experience mechanics rated 
Poor, low-experience Other and high-experi- 
ence Poor. For each test the difference be- 
tween the high-experience Other and remain- 
ing subgroups is significant beyond the one 
Per cent level, while differences separating the 
remaining subgroups one from the other are 
Not significant. 

Requirements-centered tests do not present 
a completely consistent pattern of subgroup 
Means, but if the Mechanical Principles Test, 
which is relatively short and unreliable, is ex- 
cluded, and fractional mean differences are 
ignored, a consistent pattern is evident. Both 
the Technical Knowledge and the Electrical 
nformation tests place the low-experience 
Other subgroup and the high-experience Other 
Subgroups first, low-experience Poor next, and 

igh-experience Poor last. Both of the sub- 
groups rated Poor are reliably lower in mean 
Score than either of the Other subgroups on 
the Technical Knowledge Test. No signifi- 
cant differences were produced by the other 
two requirements-centered tests. 

The high-experience Poor subgroup (N= 5) 
Occupies bottom position in all tests; it there- 
Ore does not assist in differentiating between 

e two approaches. The most probable ex- 
Planation for the low test scores of this sub- 
&roup is that it represents a recognized mi- 
Nority of poor mechanics who manage to keep 
heir association with the job and the Air 
‘Orce in spite of limited ability or poor mo- 
tivation. Three of the five men were ranked 
ast unanimously by all mechanics who listed 

em among their closest working acquaint- 
ances, The characteristics of tests adequate 
to isolate members of this subgrouP do not 
ppear to be critical. 


Experience-centered tests discriminate Un- 


ambiguously in favor of high-experience me- 
chanics generally approved by their peers. 
Requirements-centered tests, to the extent 
that they discriminate among the subgroups, 
tend to isolate the subgroups rated Poor, but 
do not distinguish knowledge presumably 
learned primarily through experience. If the 
experience-centered tests measured nothing 
except length of tenure on the job, they 
would have little practical utility. To de- 
termine whether or not some mechanics gain 
this knowledge quickly, a check was made to 
see how many of the low-experience mechan- 
ics scored above the mean of the high-experi- 
ence Other subgroup, and to examine their 
personal data and rating characteristics. A 
total of 28 low-experience mechanics for 
whom aptitude indexes were available scored 
above the mean of the high-experience Other 
subgroup on either the TRL Aviation Infor- 
mation Test or the TRL Maintenance Tech- 
niques Test, or both. The breadth of experi- 
ence represented was substantially the same 
as that of other low-experience men. Slightly, 
but not significantly more of the 28 men 
were rated Poor than among low-experience 
men generally. The group was outstandingly 
high on the Technical Knowledge Test, 21 
of the 28 having scores of 35 or higher. All 
but two of these men had both Mechanical 
and Technician Specialty Aptitude Indexes of 
6 or higher, their mean Mechanical Index be- 
ing 8.0 and their mean Technician Specialty 
Index being 7.6. Both means are signifi- 
cantly higher than the means of the remain- 
der. For 11 of the 28 men, Mechanical Index 
was higher, for 15, both were the same, and 
for two, the Technician Specialty Index was 
the higher. Thus it appears that experience- 
centered knowledge may be mastered early in 
an airman’s career, and that high aptitudes 
are indicative of ability to master it. On the 
other hand, these men’s peers are not inclined 
to regard them as outstanding mechanics. 
Low-experience men rated Poor have a 
slight tendency to score above other low-ex- 
perience mechanics on experience-centered 
tests. As indicated earlier, the Mechanical 
Aptitudes of the low-experience mechanics 
rated Poor are substantially equal to those of 
the Others, but their Technician Specialty 


388 


Indexes are nearly significantly lower. The 
low status of Poor subgroups on the Tech- 
nical Knowledge Test may be due to this 
test’s higher relationship to general intelli- 
gence, indicated by its correlation with apti- 
tudes shown in Table 2. The low-experience 
Poor subgroup’s better status on the experi- 
ence-centered tests could not be due to better 
measured mechanical aptitude. It might pos- 
sibly be due to more realistic attitudes toward 
the job, resulting in effective but not spec- 
tacular job adjustment. Since all three dif- 
ferences are small and non-significant, they 
could be due to chance. 

Table 2 gives means and SDs of tests, split- 
half. reliabilities, average correlation of each 
test with other tests, and correlation of each 
test with Mechanical Aptitude Index and 
Technician Specialty Index, for the 117 low- 
experience mechanics for whom aptitude in- 
dexes were available. Correlations presented 
indicate the TRL Aviation Information Test 
to be more independent of other tests and 
aptitude indexes than any other test used. 
The Air Force Aviation Information Test has 
correlations more nearly like those of the re- 
quirements-centered tests. The Air Force 


Harry M. Mason 


extent upon differences in reading ability, 
rather than aviation information. On the 
whole, the experience-centered tests are less 
highly correlated with aptitude indexes than 
are the requirements-centered tests. Since 
these tests are related to criteria among both 
low-experience and high-experience mechan- 
ics, it would appear to be worthwhile to €x- 
periment with a mechanical aptitude index 
depending somewhat more upon tests of 
knowledge Spontaneously acquired before en- 
tering the Air Force, It should be borne 1m 
mind, however, that the discussion relating 
to correlations is based only upon differences 
in their size, without regard for significance 
of these differences, More studies are needed 
before a firm interpretation of these relation- 
ships is attempted, 

A limitation applying to all results of the 
Present study is that the study is cross sec- 
tional; differences between mature and less- 
experienced mechanics could possibly be due 
to selective attrition, This is not likely, com 
sidering the personal data differences betwee? 
the two experience levels, but only longi- 
tudinal studies, employing broad batteries of 
tests, can establish surely the changes in job- 
knowledge which differentiate between ma- 
ture workers who have demonstrated satis- 
factory adjustment to the job and beginners 


Table 2 


Test Score Distribution Characteristics (N 


= 204) 


Test 


Correlationst 
ea | ee 


AF Electrical Information 20.5 f o 2 48 50 a2 

AF Mechanical Principles 11.3 25 2 43 58 42 

Experience Centered: ý 50 29 56 61 
AF Aviation Information 22.0 45 

TRL Aviation Information 16.7 51 cn 38 51 48 

TRL Maintenance Techniques 413 a A 30 23 34 

x 2 40 58 37 


. 


Approaches to Job-Knowledge Test Construction 


whose aptitudes are known only in terms of 
test scores. 

The most probable explanation for the dif- 
ferent relationship to criteria shown by ex- 
Perience-centered and requirements-centered 
Job-knowledge tests is that requirements-cen- 
tered tests emphasize formal requirements 
Some of which are not functionally effective, 
and that for the proven formal requirements 
ae with tests, critical levels of mas- 
fe ave not been incorporated into test ma- 
nie a Since some discrepancies between 
* requirements and learning opportuni- 

accepted by good workers on the job may 

e expected always to exist, there appears to 

1 Onting need for empirical studies 

in, ed at improving the coverage of job 
owledge tests. 


Received December 15, 1953. 


389 


References 


1. Adkins, Dorothy C. Construction and analysis of 
achievement tests. Washington: U. S. Gov- 
ernment Printing Office, 1947. 

2. Cronbach, L. J. Essentials of psychological test- 
ing. New York: Harper Brothers, 1949. 

3. Flanagan, J. C. Critical requirements: a new ap- 
proach to employee evaluation. Personnel 
Psychol., 1949, 2, 419-425. 

4. Flanagan, J. C. The use of comprehensive ra- 
tionales in test development. Educ. Psychol. 
Measmt., 1951, 11, 151-155. 

. Gragg, D. B. and Gordon, Mary Agnes. Validity 
of the Airman Classification Battery AC-1. 
Research Bulletin 50-3, Personnel Research 
Laboratory, Human Resources Research Cen- 
ter, Lackland Air Force Base, 1951. 

6. Jane, F. T., et al. Jane’s all the world’s aircraft. 

London: Sampson, Low, Marston, 24, 1934 to 
52, 1951. 

7. Thorndike, R. L. Personnel selection test and 
measurement techniques. New York: John 
Wiley and Sons, Inc., 1949. 


un 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. F No. 6, 1954 


Retest-Reliability by a Movie Technique of Test Administrators 
Judgments of Performance in Process * 


Arthur I. Siegel 


Institute for Research in Human Relations, Philadelphia 


In work sample performance testing, judg- 
ments are often made by the test adminis- 
trator regarding the manner in which an ex- 
aminee performs the components of a specific 
task. For instance, in a Drill Point Grinding 
Performance Test, the test administrator is 
asked to make judgments on the manner in 
which the examinee holds the drill while 
grinding, on whether the examinee wears 
loose clothing that could snag in the grinding 
wheel, on whether the examinee inspects the 
grinding wheel for cracks prior to grinding, 
on whether the examinee oscillates the drill 
while grinding, etc. One of the problems 
connected with this type of judgment is that 
the perceptions of the test administrators 
themselves may vary from time to time and 
thus may represent an uncontrolled variable 
in work sample performance testing. Cloth- 
ing may be perceived on one day as being 
loose enough to snag in 
while two weeks later t 
worn in the same manner 
as being perfectly safe. 
in which this type of va 
tially controlled is to k 
judged gross eno 


If a defi- 


element scored, and 
quired are kept gros 
perceptual variability 
minimized. 

It is still incumbent on 
ascertain the reliability of th 
the people who act as tes 
The ideal method for dete 
sistency of an individual 
situation in which the exami 


if the observations re- 
S, then the danger of 
im examiners may be 


the test user to 
e observations of 
t administrators. 
tmining the con- 
examiner is the 
nee’s performance 


small portion 
the data gathered under Contract Ni ot 


390 


into each’ 


is held constant over two separate eaten 
and the examiners’ perceptions allowed re- 
vary. Since the stimulus configuration an 
mains constant, any unreliability shown A 
then be attributed to variation within the an 
aminer. However, unfortunately, no one ihe 
Possibly perform the same job in exactly n 
same manner on two separate occasions. held 
method by which performance may be the 
constant is to take a motion picture of tion 
examinee performing the job. The mo ate 
picture may then be shown on two sepan 
occasions and the examiner asked to oore a 
motion picture rather than to score the ‘situs 
live performance. Thus the stimulus 5! in- 
tion is held constant over the two time at- 
tervals and any variation shown may be er: 
tributed to variation within the ezam pa 
Two assumptions of this method are that ilus 
movie situation presents the same som 
configuration to the examiner as aoee te 
actual work sample performance test 5! vie 
tion and that the examiner scores the al 
in the same manner as he would So at 
actual work sample performance test. ons 
ther assumptions are the usual assump te 
made in any test-retest reliability check. | ture 
Principal disadvantage of the motion pict!” 
technique is that movies are difficult an 
Pensive to produce. This is especially 
for long, involved jobs, 

The purpose of this paper is to present 
method of using the movie technique f0! to 
termining intra-examiner reliability eS Oe 
Present the intra-examiner reliabilities , js 


: in 
tained from a small group of test adm 
trators, 


rue 


tou! 


Method a 


of 

A 16 mm. black-white movie was made ag p 
Naval Aviation Structural Mechanic t@ mance 
Drill Point Grinding Work Sample Perion im 
Test. The film was unrehearsed and the a ecte 
structions given the subject, a randomly rind ar 
Aviation Structural Mechanic, were “to ors tol 
drill as he would ordinarily do it” 5 W 


me, 


Retest-Reliability by a Movie Technique 


Care and Use of Tools 


391 


1. Did the examinee check the tool rest for proper distance from periphery of the grinding 


wheel? 


2. Did the examinee ever adjust the tool rest while the wheel was in motion? 


3. Did the examinee use a coolant while grinding the drill? 


Procedure 


4. Did the examinee read the “Examinee Instructions?” 
5. While grinding, did the examinee oscillate the drill so that heel was moved along the 


surface of the grinding wheel? 


6. Did the examinee hold the shank slightly lower than the point while grinding? 
7. Did the examinee alternate from flute to flute while grinding? 

8. Did the examinee grind one flute and then the other? 

9. Did the examinee check the shank of the drill for bends and burns? 


10. Did the examinee secure the grinding wheel? 


11. Did the examinee “police up” the work area when securing? 


Safety Precautions 


12. Did the examinee wear eyeshields or goggles while grinding? 
13. Did the examinee tap the grinding wheel or check it for cracks prior to its use? 


Yes__No____ 
Mit Nó 
Yes___No__ 


Yes__No__ 


Yes__No___ 
VNo 
Yes___No__ 
Yes__No___ 
Yes___No__. 
Yes__No__ 
Yea No 


Yes_No—__ 
Yes___No. 


14. Did the examinee wear loose clothing or clothing that could snag in the grinding wheel? Yes___No__ 


Fic. 1. 


pat we were going to take movies of him while 
nd f working. The motion picture cameras 
and n were not hidden, but their presence 
Photo e knowledge that his behavior was being 
havi graphed did not seem to affect the S's be- 
or. 

nuit motion picture was then first shown in the 
antie S room of VC-4, Naval Air Station, At- 
ech, City, to five Chief Aviation Structural 
ee anics. These chiefs had previous experi- 
tration work sample performance test adminis- 
Bene n and were moderately well informed in the 
tes ral principles of work sample performance 
the administration. The movie was reshown to 
mias chiefs one month after its first ad- 
Pe cram One month is usually accepted as 
nal cient time interval for forgetting of origi- 
Mow sponses: Moreover, the chiefs did not 
the g that they would be asked to make exactly 
hegtme observations on two separate occasions. 

trctefore, there was little reason for them to 
The remember their original responses. ; 
val chiefs were asked to fill in the Movie 

Show yation Form (see Figure 1) during each 
Valine, Of the motion picture. The Movie 
as dation Form contained fourteen items such 
Prop id the examinee check the tool rest for 
ing a distance from the periphery of the grind- 
too eel?” ; “Did the examinee ever adjust the 
tion» St, while the grinding wheel was in mo- 
Clothin’ Did the examinee wear loose clothing or 
“Diq 28 that could snag in the grinding wheel?” ; 
for þe, © examinee check the shank of the drill 
allowen® and burns?”; etc. Sufficient Jight was 

Ail ae in the “theater” so that the chiefs cou 
n the forms as the appropriate action was 


Movie evaluation form. 


performed. Thus the motion picture situation 
was as close as possible to actually scoring a 
work sample performance test. 


Results 


The results in terms of the consistency of 
the observations of the Chief Structural Me- 
chanics who viewed on two separate occasions 
the drill point grinding motion picture are 
presented in Table 1. 


Table 1 


Intra-Examiner Consistency for Measurements of 
Performance in Process 


Per Cent 
Observer Consistency 
A 85.6 
B 71.4 
Cc 100.0 
D 64.3 
E 92.8 
Mean 82.8 


In preparing Table 1, we called S consist- 
ent on an item if he answered the item on the 
second showing of the movie in exactly the 
same manner that he did on the first show- 


ing. Thus: 


ga Ednl. i Research | 
tiA IaMNING COLLEGE | 


392 


Arthur I. Siegel 


Number of items answered in exactly the same 


Intra-examiner consistency = 


manner on each showing of motion picture 


Total number of items on questionnaire 


The grand mean for intra-examiner agree- 
ment was 82.8% with a range from 64.3% 
to 100%. This mean of 82.8% agreement 
would usually be considered adequate if con- 
verted into a correlation coefficient and inter- 
preted as correlation coefficients are usually 
interpreted. Of course, these intra-examiner 
reliability estimates are based on only one 
motion picture. The danger of generaliza- 
tion from one measure of the reliability of 
observations of performance in process to all 
observations of performance in process is self 
evident. 

In view of the range shown, the desirability 
of determining the reliability of the observa- 
tions of examiners prior to assigning them to 
test administrative duties is also indicated. 
Tf all examiners show low consistencies, then 
either the examiner training has been poor or 
the test itself is inadequate. Naturally, only 
those examiners with high consistencies are 
worthy of consideration as test administrators, 

The problem of how high examiner con- 
sistency must be before it is high enough re- 


x 100 


mains open. A second problem remaining 
open is that of the effect of increasing the 
number of judgments to be made on intra- 
examiner reliability. That is, will the Speat- 


man-Brown prophecy formula hold in this 
situation? 


Summary 


A motion picture technique was described 
and the results of its use in determining the 
intra-examiner reliability for performance test 
administrators’ observations of performance 
in process were indicated. The mean intra- 
observer consistency for observations of ele- 
ments of a drill point grinding task on tw? 
Separate showings of the movie was ade- 
quate. However, the range of consistency 
Was great enough to warrant a recommenda 
tion in support of careful investigation of the 
intra-examiner reliability prior to the a® 


cnet e 
ministration of work sample performan¢ 
tests, 


Received November 25, 1953, 


3 


THE JOURNAL or APPLIED Psy. 
i; 3 c 3 
Vol. 38, No. 6, 1954 acco 


Influences on Merit Ratings 


Aaron J. Spector 


Officer Education Research Laboratory, Air Force Personnel and Training 
Research Center, Maxwell Air Force Base * 


Steed sources of errors in merit ratings are 
top nown to users of these devices. £ Labora- 
d and field investigations have identified 
Eie, Which may be classified as: (a) char- 
aese biases of classes of raters, €g, 
N women, peers, etc.; and (b) universal 
E Ts, e.g., halo effect, error of central tend- 
Y, etc.t Somewhat neglected is the fact 
oon = stimulus, the ratee’s behavior, may 
ae ute errors which are not ordinarily 
Sp ered. His total behavior is complex 
Pit tg some behaviors which are perti- 
fare and some which aren’t pertinent to the 
eae on which he is being rated. Evalua- 
of of the pertinent behaviors independently 
all others may require special training of 
raters, This may be especially true when 
hee being evaluated are in themselves 
Gotan and subjectively loaded, e.g., po- 
qualit lties of the ratee, cooperativeness, 
EN of work, etc. Irrelevant character- 
tas Ped be so influential as to seriously 
teristi e evaluations on the desired charac- 
Sen ics, , The research presented here has 
irrel designed to investigate the effects of 
to tine ratee behaviors on ratings assigned 
im, 

8 ratee characteristic, which is irrelevant 
ee others being evaluated, has been ex- 
mentally varied in order to measure its 
ae ts on the pertinent characteristics. The 
“able being manipulated is that of amen- 
llity to suggestions. This variable was se- 
ped because of the prevalence in industry 
Septet tions where suggestions may be ac- 
the €d or rejected by the ratee and may, 
tefore, influence the rater’s evaluation of 
Uny The author was a member of the faculty at the 


ni rn f 
Enduy of Massachusetts when this study was 


ucted. He wi express his gratitude to 
this m lleagues ahe aoaiina their class time jie 
Drelimp ach, and to Mr. Churchill Morgan for the 
1 Fo tty analyses of the data. 
Mahler, a summary of the major studi 
tecent, © (2) review is more compre 


ies, see (1). 
hensive and 


other characteristics. In order to complete 
the experimental design a second variable, 
the rater’s opportunity to make suggestions 
to the ratee, was also manipulated. 


Procedures 


In five sections of a General Psychology 
course? a guest lecturer was introduced to 
the class as a student who was interested in 
becoming a college teacher. The classes, 
ranging in size from 19 to 30 students, were 
advised that they would be asked to evaluate 
his teaching ability after he had lectured. In 
all classes he delivered the day’s lecture in 
exceedingly poor fashion, making several glar- 
ing pedagogical errors, although the material 
itself was adapted from a well known text- 
book.’ After the first 15 minute period, the 
experimental variable was introduced accord- 
ing to the plan shown in Table 1. Three of 
the groups (A, B, and C) wrote notes to the 
lecturer after the first 15 minutes, suggesting 
improvements to be made in his techniques. 
A second 15 minute lecture followed, which 
was as poor as the first. At the conclusion 
of this lecture the students evaluated the lec- 
turer using a rating scale described below. 

After looking over the notes in Group A 
the lecturer accepted the suggestions by 
thanking the students for them and express- 
ing his intention of modifying his techniques, 
as per their suggestions. In Group B he 
rejected their suggestions by telling them he 
had his own ideas on improvement. Although 
the students in Group C also wrote notes 
they were not submitted until the conclu- 
sion of the second 15 minute period of lec- 
ture. At this time they made their evalua- 
tions and then submitted their suggestions. 


2The subjects were sophomore students at the 
University of Massachusetts. Sections of this course 
were randomly assigned to the experimental treat- 
ments. s 

3 The guest lecturer was trained for approximately 
seven hours in order to insure that his delivery 
would be comparable in all classes. 


393 


394 
Table 1 
Experimental Design 
Treat- 15 10 15 10 
ment minutes minutes minutes minutes 
A lecture suggestions lecture rating 
written and 
accepted 
B lecture suggestions lecture rating 
written and 
not accepted 
C lecture suggestions lecture rating 
written but 
not submitted 
D lecture no suggestions; lecture rating 
announcement 


read instead 


E lecture no suggestions; 


ratings made 


Groups D and E were not given the oppor- 
tunity to suggest any changes to the ratee. 
Instead of writing suggestions Group D lis- 
tened to an announcement read by the offi- 
cially assigned instructor; the amount of time 
required for the announcement was roughly 
equivalent to the time other groups used in 
writing suggestions. 


Aaron J. Spector 


Group E made xo suggestions and evalu- 
ated the lecturer after the first 15 minute 
period. A 

The lecturer was evaluated on a rating 
scale containing five questions measuring: 
(1) manner; (2) ability; (3) knowledge; 
(4) potential; and (5) poise. For each ques- 
tion the individual subjects checked one of 
seven boxes which were ordered on a iat 
tinuum, as illustrated by question 1, whic’ 
read, “Compared to others, this lecturers 
manner while lecturing was: As poor as any 
Ive seen; Considerably worse than most; 
Not quite as good as most; As good as most; 
Somewhat better than most; Considerably 
better than most; As good as any I’ve ener 
The responses on each factor were weighte 
0-6, higher scores being assigned to the more 
favorable responses, 


Results 


The most favorable ratings on all five fac: 
tors were recorded by the acceptance oad 
(A), as shown in Table 2. The poorest 14 
ings were given by Group E, which made ie 
suggestions and had only 15 minutes of m 
ture. The other no-suggestion group 
also rated the lecturer relatively unfavorably 
The Mean ratings of B and C groups wer” 
equal, but higher than either D or E. It 3P 


Table 2 


Mean: iati i 
s and Standard Deviations of Ratings on Each Characteristic for Each Treatment 


Questions 
et ~ 2 3 4 5 
ner ability knowledge potential poise N Meant SD ro 
A M211 2.16 2.95 86 
SD 45 59 ‘SL r ei aa HIR l 
M 128 l 
B 1.52 2.64 2.60 1.19 
. i 1. 3 $ 
sD 77 ‘04 ‘93 ie ee 25 1.88 
M 130 
c 7 1.64 2.25 9 
SD 70 98 m Fe i nies 
M 153 1 ) 
i 69 f 92 
SD 80 69 Ea E = 3 aa 
M 150 1.13 l 34 
E a 3 2.07 1.93 1 
SD 1.51 85 1.18 1.26 i. W oo 
SDeo 1.02 91 98 1.12 1.15 


Influences on Merit Ratings 


pears that expression of criticism of the lec- 
turer, via written suggestions, resulted in 
raters giving higher evaluations than when 
the raters had no opportunity for this expres- 
sion. These results obtained when the rater’s 
Suggestions were not submitted to the ratee, 
as well as when they were submitted and ac- 
cepted or rejected. 

The most favorable ratings, however, were 
consistently made by the group whose sug- 
Ee were accepted by the ratee. Appar- 
eh y; amenability to suggestions or expressed 

ention of compliance with the suggestions, 

Operated to bias the raters’ evaluations of the 
lecturer, 
Muns data, were analyzed further by analysis 
font An F ratio, obtained with total 
te of all subjects in each treatment, 
3 B that the mean total scores were 
ae ificantly influenced by the treatments ac- 
orded the groups (Table 3). 


Table 3 


Analysis of Variance of Total Merit Rating Scores of all 
Subjects in Five Experimental Treatments 


Source of Variation df M.S. F P 


Between treatments 4 2123.66 549 .01 


ithin treatments 132 386.51 


t The ratings on each characteristic were 
= examined. F ratios indicated that the 
ac on four of the questions varied sig- 
is cantly between groups (Table 4). That 
th the experimental treatments accorded to 
ee groups differentially affected their ratings 
four out of the five characteristics. 
i The only Between Groups variance which 
— not significantly different from chance 
as on evaluation of the lecturer’s ability. If 
© students measured the lecturer’s ability 
Y the amount they had learned or by the 
Quantity of notes they could take, it is un- 
ag tandable that their evaluations would 
Pe since neither learning nor note taking 
e€ easily from his lecture. 


4 


ous by 
Bar ne 3 
5 


The variances were found to be homoge 
etts test. è 
betwee average intercorrelation of .18 was obtained 
Van Sn items on the rating sheet, using Peters an 


Voorhis’ formula (4, pp. 196-200). 


395 


Table 4 


Analysis of Variance of Ratings on Each Question 
for All Treatments Simultaneously 


Between Treatments 


Ques- 
tion Within Treatments df F P 
2.902 4 
d — 2.76 .05 
1.052 132 
-2025 4 
2 — 03 
801 132 
2.882 4 
3 — — 2.60 05 
1.109 132 
8.516 4 
4 ae 6.50 01 
1.310 132 
3.534 4 
5 —— — 2.61 05 
1.356 132 


However, no such simple criteria existed for 
rating his manner, knowledge, poise, or par- 
ticularly his potential. These ratings may 
reflect personal frames of reference and hence 
are more readily influenced by extraneous 
factors such as acceptance or rejection of 
suggestions. Similarly, the factors of pro- 
motability and quality of work, which are 
frequently found on industrial merit rating 
scales, may be especially prone to the influ- 
ence of irrelevant behaviors of the ratee. 


Discussion 


The cathartic effects of expression of criti- 
cism via written messages, noted above, are 
consistent with the findings of Thibaut and 
Coules (4). Their data indicated that per- 
sons who were insulted, and then allowed to 
express their hostility toward the instigator, 
via written notes, later made a greater num- 
ber of friendly remarks about the instigator 
than did other insulted persons who had no 
opportunity to express their hostility. The 

resent data suggest that poor impressions, 
like ill feelings, may be altered or reduced, 
by their expression.© Low ratings may re- 

6 i is n 
oy eee cans CS) fn his discussion of “autistic hose 
tility.” 


396 Aaron J. 
flect a barrier in communications between the 
supervisor and his subordinates, rather than 
true deficiencies of the ratees. Therefore, a 
likely hypothesis is that merit ratings in in- 
dustry may be influenced by the degree to 
which the rater feels free to criticize or make 
suggestions to ratees. 

The practical importance of the finding 
that irrelevant characteristics of ratees may 
bias raters’ judgments is difficult to evaluate 
without more knowledge of: (a) the kinds of 
ratee behaviors which act in this way; and 
(b) the amount of bias these behaviors in- 
duce. At any rate, it is clear that amen- 
ability to suggestions induces sufficient bias to 
significantly affect ratings on several factors. 


Summary 


Students in five sections of a general psy- 
chology course listened to a lecture which was 
intentionally delivered in poor fashion. They 
were then asked to rate the lecturer on five 
characteristics, using a seven point scale. Be- 
fore they rated him three of the groups sug- 
gested methods by which the lecturer might 
improve his techniques. One of these groups 
did not submit their Suggestions to the lec- 
turer; in another group the lecturer rejected 
the suggestions, while in the third he ac- 


Spector 


cepted them. In two other groups the sub- 
jects did not write suggestions. In no case 
did the lecturer actually implement the sug- 
gestions, or improve his delivery. 

The ratings were: (a) consistently most 
favorable in the acceptance group; (b) more 
favorable in the suggestion than the no-sUg- 
gestion group; (c) significantly different 0? 
the characteristics of manner, poise, potential 
and knowledge. 3 

It has been suggested that poor ratings 
may reflect barriers in communications be 
tween the rater and the ratee, rather than 
true deficiencies in the ratees. 


Received November 20, 1953. 


References 


1. Guilford, J. P, Psychometric methods. N. ve 
McGraw-Hill, 1936. į 
2. Mahler, W. R. Twenty years of merit ranni, 
1926-1946. N. Y.: The Psychological CoP” 
1947. 
- Newcomb, T. Autistic hostility and social 1° 
ality. Hum. Rel, 1947, 1, 69-86. «tical 
4. Peters, C. C. and Van Voorhis, W. R. Statist 
Rue and their mathematical base’ 
A Mss McGraw-Hill, 1940, - 
- Thibaut, J. W, and Coules, J. The role of com 
munications in the reduction of inter-perso™ 
hostility. J. abnorm. soc. Psychol, 1952, *? 


3 


n 


770-778. 


Å—Ė 


THE JOURNAL oF AP yi 
Vol. 38, No. 6. cx aa PsycHoLocy 


Psychological Research on Accidents: Some Methodological 
Considerations * 


Kenneth S. Teel 


and 


d 
f 
i 
Human Factors Operations Research Laboratories 
t 


Philip H. Du Bois 


Washington University 


Recently controversy has arisen over the 
methods that can appropriately and meaning- 
fully be used in psychological research on ac- 
cidents. An article by Mintz and Blum (8) 
advocating use of the Poisson distribution and 
analysis of variance for estimating the ex- 
tent of personnel-centered accident liability 
Meubitated the discussion. In opposition, 
_ Maritz (7) argued that “. . . the direct tech- 

Nique of ‘correlating consecutive periods’ is in- 
oo. More recently, however, Blum 

Mintz (1) and particularly Webb and 
ibe (13) have pointed out that the differ- 
sh techniques are basically the same and 
X ould for all practical purposes yield similar 

4 esults, 
E This paper aims to point out certain short- 
Pgs of these methods and to propose 
re refined solutions. 


The Poisson Distribution 


ci A frequency distribution of numbers of ac- 
oth, by individuals can be symmetrical 
jd when the mean number of accidents is 
ies bly greater than unity. When there 
cat ewer accidents than individuals, the zero 
raat must have the greatest frequency, 
trib superficially at least the obtained dis- 
\ “bution must resemble the Poisson distribu- 
'on—often used to estimate the extent to 
cy variations in accident histories may be 
Se le to chance factors. Methods of 
oe the theoretical distribution and of 

ing the difference between it and the one 


1p: 
iaai ‘Paper was prepared at MacDill Air Force 
Mm: while PHDB was acting as consultant to the 
The 2, Factors Operations Research Laboratories. 
the apa nions expressed in this paper are those of 
of thethors and do not necessarily reflect the views 
© Air Force. 


actually obtained are explained elsewhere (2, 
8, 9). 

Interpretation of the obtained results is, 
however, fairly difficult. First, let us con- 
sider the simpler of the two possibilities— 
that in which the obtained distribution devi- 
ates significantly from a Poisson. Ordinarily 
this result is interpreted as indicating the 
presence in the population of varying degrees 
of accident proneness. Such an interpreta- 
tion is justified only if all persons were ex- 
posed to the same hazards; if this were not 
the case, the significant deviation from the 
Poisson might be reflecting little more than 
differences in exposure. 

Second, let us consider the opposite result 
—that in which the obtained distribution does 
not deviate significantly from a Poisson. 
This result is usually interpreted as indicat- 
ing that chance factors may account for the 
obtained variations in accident records and 
that the null hypothesis cannot therefore be 
rejected. Here again, however, the inter- 
pretation is open to question, for representa- 
tion of the data by a Poisson does not elimi- 
nate the possibility of significant correlation, 
either between accident records in successive 
periods or between accident records and logi- 
cally related predictors. Maritz (7) has al- 
ready demonstrated the possibility of ob- 
taining a correlation of .80 between two 
Poisson distributions of accidents in separate 
time periods. 

The coarseness of the measuring unit—the 
fact of an accident—presents further logical 
problems in interpreting the results of a 
Poisson fit. For administrative purposes, 
some arbitrary definition of an accident is 
necessary: however, rigid adherence to the 


4 397 


398 


definition forces into a discrete series behav- 
iors which really are on a continuum. It 
seems perfectly reasonable to assume that 
the same behaviors which might in one in- 
stance result in extensive materiel damage or 
personal injury might in another result in 
merely a “close call.” It therefore seems 
logical to assume also that the persons com- 
prising each of the accident frequency groups 
(zero, single, or multiple) exhibit these be- 
haviors in varying degrees and could, if more 
complete information were available, be fur- 
ther differentiated. 

In short, use of the Poisson distribution is 
at best a preliminary step in the study of 
accident proneness. It serves only to provide 
a quantitative estimate of the stability of 
accident rates. It furnishes no information 
whatsoever on the relationships between spe- 
cific personnel and situational factors on the 
one hand and accidents on the other 


Correlational Techniques 


Correlational techniques have most fre- 
quently been used in accident research to 
provide a quantitative estimate of the con- 
sistency of the accident behavior of individu- 
als from one period of time to another. In 
osure in the two pe- 
s have chosen periods 
s (4) or odd and even 
correlations are typi- 
ttle reliability of the 


he length 
Because of the 


the assumption of equal exp 
dividuals be granted, the co; 
dents in separate time periods Provides little 
more than a preliminary estimate of the sta- 
bility of the accident Criterion. Such an ap- 
proach affords no means of identifying i 


osure among in- 
rrelation of acci- 


Kenneth S. Teel and Philip H. DuBois 


sons most liable to accidents except from 
their accident histories, It does not, there- 
fore, provide any basis for predicting which 
members of an independent sample will have 
accidents until after some have experienced 
them. Since our goal is not only to reduce 
but also to prevent accidents, this approach 
obviously is inadequate. 

A more adequate but less widely-used ap- 
plication of correlational techniques in acci- 
dent research is that of correlating individu- 
als’ scores on theoretically-related predictor 
variables (such as intelligence, psycho-motor 
ability, physical condition, etc.) with their 
accident records, In actual practice, however 
this approach (3) too has typically yielde 
low correlations—again probably because ° 
the grossness of the accident criterion. 


Suggested Refinements 


The design of research on personnel factors 
in accidents is admittedly difficult. Howeve 
We would like to suggest several refineme? 
which would make such research more useful ; 

The first and most needed refinement is 4 
more sensitive criterion measure. The SY* 
tematic collection of information on “near 
accidents” and the critical behaviors involve 
therein would Provide such a criterion ( i 
The Strategic Air Command, USAF, has A 
ready adopted a policy of collecting the 
data and using them as a basis for bO 
remedial and preventive training in how i 
react in emergency situations, ‘Thus, pe 
accident” data may have practical value, oe 
before they accumulate in sufficient quant 
to provide a reliable criterion against W4 
Personnel factor variables can be validate 
The high ratio of near-accidents to accider 
indicates that the length of the exposure P 
riod required for reliable measurement WO" 
be considerably shorter, 


. pette! 
second much-needed refinement is be 


e . . į 2 
differentiation between “personal” and “Si pe- 
nl ance 3 rinate 
tional” accidents.? We must discriminate a 


ui 
tween those accidents caused largely by git 


2 pe 
_ “The comments contained in this and the fed 
ing paragraph apply to whatever criterion 15 | 


g jon 
e it accidents, near-accidents, or a combinati? ppe 
both. The term “accidents” alone is used 


subsequent discussion implicity ° 
entatioe solely for simp! 


ne 


Psychological Research on Accidents 


tional factors and those caused primarily by 
personnel factors. Theoretically, situational 
accidents, such as those caused solely by ma- 
teriel failure, occur completely independently 
of the personnel involved. If this be the case, 
we should not expect to be able to predict 
them from a knowledge of personnel factor 
Variables only. The inclusion of situational 
accidents in a study of personnel factors will 
lower the obtained correlations and make 
them extremely difficult to interpret. 

„One might legitimately ask if the proposed 
differentiation can successfully be made. The 
recent study of Kubis, Buckley, and Sack- 
man (5) indicates that it can. 

A third requisite for more meaningful 
Studies of personnel factors in accidents is the 
Systematic collection of more complete infor- 
mation on exposure to hazard. Whenever 
posslie, records of the time spent in per- 
Ormance of the various aspects of a job 
Should be maintained, for they provide the 
T for determining relative hazards. Once 
a ħave adequate data on the time spent and 
accidents incurred on the several parts of a 
i we can determine the risk per unit of 
igre each. We can then compute an 
es of exposure for each person which 
Na ts his experience in the various phases 

the job by the risk associated with each. 
a index of this sort has recently been de- 

. Oped and successfully applied to an Air 

Orce population by Warren et al. (11). 

S such an index be available for all mem- 

S of our sample, we need not make the 
maestionable assumption of equality of ex- 
i Sure. Instead, we can, to some degree at 
toe? remove the effect of differential ex- 
ret from our computations by partial cor- 

ation or other appropriate technique. 
iiit fourth proposed refinement is the com- 
Nias of correlations between individuals 
ss on certain theoretically-related pre- 
Ñ or variables and their records of accidents 

: Which the measured traits are thought to 
a important. For example, we would hy- 
rea that psycho-motor test scores should 

ti Iet only those accidents in which psycho- 

Mien deficiencies are a primary contributing 
tained Testing the significance of the ob- 

correlation coefficient would then en- 


399 


able us to confirm or refute our hypothesis. 
On the other hand, correlation of scores on 
such tests with over-all accident records would 
be likely to obscure any relationships which 
might exist. 

In groups where the number of accidents is 
small and the members are engaged in di- 
verse tasks for which the degree of hazard is 
difficult to estimate, correlational studies of 
the type just outlined are not likely to prove 
worthwhile. Consequently, a different at- 
tack must be made on the accident problem. 
The one recommended is detailed situational 
analysis, accomplished by experienced job 
analysts or safety engineers or both and fol- 
lowed by administrative actions designed to 
overcome identified hazards. As a matter of 
fact, a thorough situational analysis (6) is 
indispensable even in those instances (such 
as in the military and in large industrial or- 
ganizations) in which the correlational ap- 
proach is applicable, for the contribution of 
personnel factors to existing accident rates 
is usually considerably less than that of situa- 
tional variables. 

Until those factors, either within individu- 
als or within situations, related to accidents 
are clearly identifed and remedial actions in- 
stituted, much research remains to be done. 
As yet, our research efforts have not ap- 
proached this point. 


Received December 21, 1953. 


References 


1. Blum, M. L. and Mintz, A. Correlation versus 
curve fitting in research on accident prone- 
ness: Reply to Maritz. Psychol. Bull., 1951, 
48, 413-418. 

2. Burke, C. J. A chi-square test for “proneness” 

in accident data. Psychol. Bull, 1951, 48, 

496-504. 

3. Fitzpatrick, R., Vasilas, J. and Peterson, R. 
Personnel and training factors in fighter air- 
craft accidents. Human Factors Operations 
Research Laboratories, Bolling AFB, Washing- 
ton 25, D. C., 1953. 

4. Ghiselli, E. E. and Brown, C. W. Accident 
proneness among streetcar motormen and mo- 
tor coach operators. J. appl. Psychol., 1948, 
32, 20-23. 

. Kubis, J. F., Buckley, E. P., and Sackman, H. 
The fatal ground accident in the United States 
Air Force. Human Resources Research Labo- 
ratories, Bolling AFB, Washington 25, D. C. 
1952. 


tn 


400 


6. LeShan, L. L. and Brame, J. B. A note on 
techniques in the investigation of accident 
prone behavior. J. appl. Psychol., 1953, 37, 
79-81. 

7. Maritz, J. S. On the validity of inferences 
drawn from the fitting of Poisson and nega- 
tive binomial distributions to observed acci- 
dent data. Psychol. Bull, 1950, 47, 434-443. 

8. Mintz, A. and Blum. M. L. A re-examination 
of the accident proneness concept. J. appl. 
Psychol., 1949, 33, 195-211. 

9. Thorndike, R. L. The human factor in acci- 
dents with special rejerence to aircraft acci- 
dents. USAF School of Aviation Medicine, 
Randolph AFB, Texas, 1953. 


Kenneth S. Teel and Philip H. DuBois 


10. Vasilas, J. N., Fitzpatrick, R., DuBois, P. H» 
and Youtz, R. P. Human factors in near 
accidents. USAF School of Aviation Medi- 
cine, Randolph AFB, Texas, 1953. 

11. Warren, N. D., Mackie, R. Rọ, Simmons, R. F» 
and Rodman, I. L. An index of accident ex- 
posure for flying in the USAF. Human Fac- 
tors Operations Research Laboratories, Bolling 
AFB, Washington 25, D. C., 1953. $ 

12. Webb, W. B. and Jones, E. R. Repeater pilot 
accidents in the United States Air Force. Hu- 
man Resources Research Laboratories, Bolling 
AFB, Washington 25, D. C., 1952. š 

13. Webb, W. B. and Jones, E. R. Some relation’ 
between two statistical approaches to aca 
Proneness. Psychol. Bull, 1953, 50, 133-136: 


4 


— oO? 


— 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 6, 1954 : 


Time Intervals between Accidents 


Alexander Mintz 
City College of New York 


This study was undertaken in the hope of 
contributing to a clarification of the evidence 
on which the widely accepted theory of in- 
dividual differences in accident liability is 
based. This evidence is incomplete. One of 
the principal facts included in it is the fre- 
quently-good fit of obtained accident dis- 
tributions to the so-called unequal liabilities 
distribution 1 derived by Greenwood and Yule 
(4). This derivation includes the assumption 
of constancy of individual liability to acci- 
dents, i.e., the notion that accident liability 
of particular persons does not change with 
he Passage of time or with the occurrence of 
accidents, 

_ However, there are theoretical considera- 
tions which suggest that such an assumption 
'S not likely to be exact. An accident can be 
expected to function at times as a traumatic 
€xperience and to disrupt subsequent behav- 
‘or. It can also be expected to function as a 
Punishment and, as such, to have one or an- 
Other effect on the learning of the individual. 
here are some accident distributions which 
ie Not fit theoretical distributions based on 

‘sumptions of the constancy of accident lia- 

ility (3). Very likely, in such cases acci- 
ent proneness is affected by the accidents. 

aren in the cases in which theoretical dis- 
ep butions based on the assumption of con- 

ant accident proneness do fit the data, the 
Possibility of inconstant accident proneness 

nnot be excluded. It has been shown, e.g., 

Y Irwin (6), that the same distribution 

k ich Greenwood and Yule derived in part 
ae the assumption that accident liability 
Nig from one individual to another but is 
foe for each individual, can also be de- 
cee from the assumption that there are no 
in ial variations in accident liability, but that 

Stead each accident increases the proneness 
me individual by a constant amount. It 
ù undoubtedly also be derived from an as- 

en Ption of large initial differences in accl- 
bilie liability and decrease of accident lia- 
Y with the occurrence of accidents. 


1 
Or negative binomial distribution. 


401 


Thus only tentative inferences may be drawn 
about the probable underlying distribution of 
accident liability from an obtained set of ac- 
cident records, unless something is known 
about whether and how accidents occurring 
to people affect their accident liability. Not 
even the existence of initial differences in ac- 
cident liability in the group may be inferred 
with certainty without such knowledge. In 
the absence of such knowledge, the assump- 
tion of unchanged liability after accidents 
was generally either implied (1) or explicitly 
made (10, 11) by workers in this field. 

The factual evidence on the validity of the 
assumption of accident liability as unchanged 
by accidents is very scanty. Irwin has com- 
mented upon a few results on accident rates 
of groups of people in consecutive periods 
which were opposed to his hypothesis of acci- 
dent proneness increasing with accidents. The 
accident rates did not tend to increase; the 
changes were slight, and, if anything, the 
rates tended to drop. A rather similar find- 
ing has been discussed by Kerrich (8). On 
the other hand, Horn has presented material 
on time-intervals between airplane accidents 
which suggested to him that accident suscepti- 
bility is temporarily increased by accidents. 
He recommended adjustment techniques for 
pilots following accidents. Thus the question 
may be of great practical importance to those 
interested in preventing accidents. 


The Problem 


It was thought that further research on 
time-intervals between accidents was desir- 
able. Increasing accident proneness should 
show itself as a trend toward decreased time- 
intervals between accidents, while decreasing 
accident proneness should show the opposite 
trend. The problem of this paper was to dis- 
cover whether trends towards such changes 
of time intervals do or do not occur. 

However, there are methodological difficul- 
ties in such research. Thus it is not im- 
mediately obvious, how the interval before 
the first accident and the interval after the 


402 


last accident during the arbitrary observation 
period should be treated, compared to the 
time intervals between accidents. This paper 
reports an attempt to deal with one set of 
data on time-intervals between accidents. It 
is hoped to provide examples of the type of 
information which can be obtained from the 
study of time-intervals, and to present some 
material relevant to the methodology in this 
field. The material is examined chiefly in 
relation to two possible theories: first, that 
accident proneness is constant for each indi- 
vidual; and, conversely, that proneness is in- 
creased with accidents. 


The Data 


The data examined were accident records 
of 178 taxi drivers, made available by Dr. 
E. Ghiselli, whose cooperation is appreciated. 
The period covered was one year. For each 
driver, the weeks in which accidents occurred 
were indicated. All drivers had worked for 
the company at least a year prior to the be- 
ginning of the observation period. Six drivers 
who resigned from their jobs during the ob- 
servation period, or who were absent from 
work for eight or more weeks, were elimi- 
nated. Thus records of 172 drivers were in- 
cluded in this study. 


The Mathematical Background 


In order to discover what the time intervals 
before, between, and after the accidents indi- 
cate about the possible effects of accidents on 
accident liability, it is essential to compare 
them to the statistical expectancies based on 
the assumption that accidents are distributed 
over a time period completely at random. 
The hypothesis of random distribution of 
points within an interval has been previously 
studied by Whitworth (13), Greenwood (2) 
Moran (12), and Maguire, Pearson and 
Wynn (9). It assumes that each accident is 
independent of all other accidents and that 
its occurrence is equally probable at all times 
during the period. However, this is assumed 
only if each accident is viewed as a separate 
entity, defined in terms of what happens 
(e.g., sideswiping a particular telephone pole) 
rather than in terms of the position of the 
accidents relative to each other, Sideswiping 
the telephone pole is more likely to be the 


Alexander Mintz 


first accident if it happens early in the ob- 
servation period than if it happens late; and 
so with other types of accidents. The prob- 
ability that a particular accident is the first 
one during the observation period is propor- 
tionate to the probability that no other accl 
dent has yet taken place. This probability 
decreases with the passage of time. KAS 
If z accidents have happened to an indi- 
vidual during a time interval of unit duration, 
the probability of the first accident happen- 
ing at time x decreases in proportion wit 
(1—x)"* as æ increases, The probability 
function of the first accident within a tota 
time interval of unit length is given by be 
expression 7(1—x)"-1, The probability ° 
the second accident at time x involves sai 
that one accident must have taken place 
ready, and second, that no other wenn 
shall have happened yet. Its formula = 
n(n —1)x(1 — x)"-?, and similar expres 
sions may be derived for the probabilities ° 
the times of the other accidents. d 
Probability functions can generally be us¢ 
for the computation of theoretical means an” 
standard deviations. Such computations indi- 
cate that, in terms of the null hypothesis, s 
tistical expectancies for the mean time-inte" 
vals from the beginning of the observation Pa 
riod to the first accident; from the first to : t- 
second; and so on, including the time-inté 
val between the last accident and the end n 
the observation period are the same. ee 
larly, the expectancies of the variances of t g 
time-intervals are also identical. In studying 
the possible effects of accidents on accid® 
liability, the periods before the first accide? 
and after the last accident may be treate 


el 
the same manner as the intervals betwe 
accidents, : 


Results 


Of the 172 drivers included in the co 
putation, 60 had no recorded accidents * 
112 drivers had one or more accidents, 1%, 
ing up to 25. The accident distributio” ibu 
very different from the theoretical dist™iP™; 
tion which results from the assumptions ap 
equal and constant accident liability. Pjs- 
equal liability or so-called Poissonian a] 
tribution, the variance of accidents i8 © ut 
to the mean. In the Ghiselli data, it is a 


PEN O 


Time Intervals between Accidents 


six times as large as the mean. The distribu- 
tion seems to be capable of being explained 
in terms of the hypothesis of large stable dif- 
ferences in accident liability. On the other 
hand, in accordance with the considerations 
mentioned earlier, it also can be explained in 
terms of other assumptions, e.g., that of linear 
Mcrease of accident liability with accidents. 

What do the time-intervals indicate? They 
Were first examined separately for groups of 
drivers with different accident records. 

A total of 45 drivers had one accident each. 
vy theoretical expectancy for the position of 
‘ah mean time of a single accident is 26 
agi The obtained mean was 21.9 weeks. 
ieee critical ratio was 1.99,° which is signifi- 
th at the .05 level. It should be noted 

at this suggestive difference is in the op- 
Posite direction from the one which would be 
expected if accident liability increased with 
PS cig The question was not investigated 
i ether this result was due to the fact that 
en liability decreased with the first ac- 

ent in this group, or whether it was pro- 
uced by seasonal fluctuations. 
So In the two-accident case, the situation was 
tho what similar. The mean durations of 

e three time-intervals (up to the first acci- 
Py between the first and second accidents, 
nd from the second accident until the end of 

© observation period) were 13.1; 15.6; and 
ae. Tespectively. The theoretical expect- 
à Cy is 17.33, with a standard error of 3.06 

eeks, The differences between the time-in- 
soie are again suggestive of a decrease in 
ie t liability after accidents, but the re- 
č does not seem to be statistically signifi- 
ant, 
TAa situation was somewhat different in 
a Cases which had three, four, and five acci- 
cone Here the first and last time-intervals 
ac € longer than the time-intervals between 

Cidents, but this finding was again of doubt- 
tha. atistical significance. Groups with more 
tiile five accidents were too small for de- 

ed Presentation. 
the e significant fact which emerges from 
oF examination of the mean time-intervals 

Stoups of drivers, classified on the basis of 


i . 
Used. he critical ratio rather than the t-ratio Wis 
be ang cause a theoretical standard deviation coul 

was utilized. 


403 


number of accidents, is that there was no con- 
sistent trend toward a decrease of time-inter- 
vals with repeated accidents. Table 1 pre- 
sents the data. 

In the preceding discussion, separate com- 
parisons of time intervals were made within 
each group of drivers with a particular num- 
ber of accidents. These groups were for the 
most part very small. Therefore the data 
were also treated in another way, in terms of 
cumulative groups. For all drivers who had 
one or more accidents, the mean time inter- 
val before their first (or only) accident was 
ascertained. The mean time interval before 
the second accident was ascertained for all 
drivers with two or more accidents, and for 
the same group the mean time interval be- 
fore the first time accident was also com- 
puted. Similarly, the mean time intervals 
before the third accident and before the first 
accident were determined for the group with 
three or more accidents, and so on. 

The results of these computations are pre- 
sented in Table 2, the first column of figures 
giving the mean times between the consecu- 
tive accidents and the second column giving 
the mean times before the first accidents of 
the same people, and the third column pre- 
senting the differences. It should be noted 
that according to both hypotheses considered 
in this paper, the figures in the first column 
should tend to decrease as one proceeds down 
the table. 

According to the theory of individual dif- 
ferences in accident proneness, the same de- 
crease is expected in the second column; this 
is to be expected because the bottom of the 
table deals with drivers who had repeated 
accidents, because repeated accidents are apt 
to be indicative of high accident proneness, 
and because high accident proneness is apt to 
result in short time intervals both before the 
first accident and between the later acci- 
dents. According to the theory of increased 
proneness following accidents the decrease 
should be much more pronounced in the first 
column than in the second one, and the dif- 
ferences in the third column should tend to be 
negative and to increase in absolute amount. 

3 There are two reasons for expecting some down- 
ward trend in these figures according to the behavior 


disruption theory. First, there are selective factors: 
people who had the first accident early “by chance” 


404 


Alexander Mintz 


Table 1 
Mean Times * Before the First Accident, Between Accidents, and After the Last Accident 


One-accident group (a = 45): 21.9; 30.1 


Two-accident group (n = 16): 13.1; 15.7; 23.2 

Three-accident group (n = 13): 16.1; 9.6; 12.9 ; 13.4 

Four-accident group (n = 13): 14.0; 8.6; 5.5; 5.6; 18.3 

Five-accident group (n = 4): 9.7; 6.0; 5.5; 7.8; 8.2; 14.8 

Six-accident group (n= 85): 5.9; 4.8; 6.8; 9.2; 10.6; 10.0; 4.7 : 

Seven-accident group (n = 3): 2.5; 10.0; 5.3; 5.7; 9.0; 7.0; 6.0; 6.5 i 

Eight-accident group (n = 2): 1.0; 2.0; 6.5; 7.5; 5.5; 2.0; 2.5; 11.5; 13.5 

Nine-accident group (n = 3): 2.2; 2.7; 9.7; 6.7; 3.0; 3.3; 1.3; 8.0; 5.0; 10.2 7 
Eleven-accident group (n= 2): 8.5; 2.5; 1.5; 2.5; 1.5; 3.5; 6.0; 4.5; 8.5; 2.5; 6.0; 5.5 
“Twelve-accident group (n= 1): 2.551; 6; 4; 6; 2; 1; 2; 6; 1; 2; 4; 14.5 

Thirteen-accident group (n = 1): 1.531; 551; 2: 1; 8; 2; 13; 1; 11; 450.5 

Fifteen-accident group (n= 1): 1.5; 1;1;3;3; 3; 6; 5; 1; 4; 5; 3; 10; 2; 1.5 

Sixteen-accident group (n= 1): 2:5;:732;1; 1; 5; 6; 3; 1; 5; 2; 6; 1; 2;3;3.5 
Eighteen-accident group (n = 1): 0.5; 2; 1; 12; 1; 3351515151; 1; 5; 1; 2; 1; 3; 10; 4.5 me 1505 
Twenty-five-accident group (n = 1): 3.53151; 1;2;1;1;5; s 11; i6; 1;6; 1; 1; 1;7;1;1;3; 1545 


The increase of the magnitude of the differ- 
ences should occur because the accidents in- 
tervening between the first accident and the 
later ones are assumed to increase their acci- 
dent proneness. This factor would be as- 
sumed to produce a marked decrease in the 
figures of the first column; it would be as- 
sumed to be lacking in the case of the figures 
in the second column, in which only a weaker 
downward trend would be expected. 

There are a number of statistical procedures 
by means of which the agreement of the two 
hypotheses with the dat 


siderations was likely j 

space. Therefore the materjal is treated in 

terms of a simple inspection of the table, 
The expected tendency towards decreasing 

time intervals between the higher numbered 

accidents is present. As the incidence of ac- 


have more time left in which they 
tional accidents; second: the drive: 


esult of accident: 


a S S occurring 
before the observation period. 


st 
cidents rises, the time interval before thos 
accident also tends to grow shorter, to 4 the 
the same extent. As one reads down rget 
table, there is no tendency toward la are 
negative values of differences. There dif- 
some fluctuations in the values of these rge; 
ferences, but these fluctuations are not i 
do not suggest an intelligible pattern, not 
according to tentative computations 40 
seem to be Statistically significant. 


Discussion 

These results are clearly not in avaa 
the hypothesis of increased accident susC the 
bility with accidents. For this set of el 
theory of Proneness, varying from perso? eats 
Feasonably-constant for each person aPP 
to be more appropriate. P It 

This conclusion requires qualification? i 
should be noted that certain factors Wer? rpe 
taken into consideration in this study: A t 
possibility of seasonal fluctuations in ee n 
rates was one such factor. Another H j 
considered had to do with the differen ot) 
tances driven by the different. drivers- tioned 
of these factors are likely to have fune ra 
as sources of variation in the accident av 
and taking them into account shoul othe 
given a somewhat better test of the by? indi 


sis of constant accident proneness ° 
viduals. i 


of 


f Time Intervals between Accidents 405 
Table 2 
Comparison of Mean Time Intervals in Weeks Before First Accident and Before Later Accidents 
Mean Time Interval 
q Mean Time of the Same Drivers 
Interval Before First Accident Difference 
Before 1st accident 15.1 
(112 drivers) 
Between 1st & 2nd 8.9 10.7 -1.8 
(67 drivers) 
Between 2nd & 3rd 7.3 9.9 —2.6 
(51 drivers) 
| Between 3rd & 4th 6.0 7.8 —1.8 
(38 drivers) 
Between 4th & Sth 6.0 4.5 1.5 
(25 drivers) 
Between 5th & 6th 4.8 3.5 1.3 
(21 drivers) 
Between 6th & 7th 3.3 2.8 0.5 
(16 drivers) 
Between 7th & 8th 6.5 2.9 3.6 
(13 drivers) 
i Between 8th & 9th 4.6 3.2 1.4 
j (11 drivers) 
Between 9th & 10th 3.2 3.6 —0.4 
(8 drivers) 
Between 10th & 11th 3.2 3.6 —0.4 
(8 drivers) si 
Between 11th & 12th 4.0 2.0 : 
l (6 drivers) F 
Between 12th & 13th 4.8 1.9 s 
(5 drivers) 
3.2 2.0 1.2 
Between 13th & 14th ee 
E (4 drivers) si m an 
Between 14th & 15th 
(4 drivers) ‘a - be 
Between 15th & 16th : 
(3 drivers) ox m öö 
Between 16th & 17th 2! 
(2 drivers) 5 si ay 
4 Between 17th & 18th a 
(2 drivers) e = aul 
Hay heat 3.5 3.5, —2.5, —2.5, —0.5, 
Between 18th & 19th, 1,1,9,1, 1, 05 A5; 255 
© 19th & 20th, etc. 
(1 driver) 


However, this hypothesis is probably only 
a approximation which is not applicable to 
individuals and groups. The records ofa 


few drivers suggest temporary fluctuations of 
accident proneness with some individuals. 
Temporary increases in accident proneness 


406 


may well be due to periods of emotional 
stress. 

However, there is need to investigate the 
statistical significance of such apparent fluc- 
tuations in accident proneness or liability. 
Maguire, Pearson and Wynn (9), Green- 
wood (2), and Irwin (7) have pointed out 
certain difficulties in determining the sta- 
tistical significance of departures of sequences 
of time intervals from randomness, and it is 
not entirely clear to this writer whether the 
problem has been solved. 

The apparent lack of systematic effects of 
accidents on accident rates found in this 
study need not hold for all groups. It may 
have partly resulted from the fact that all 
drivers had worked for the company at least 
a year before the observation period. Con- 
siderations based on the psychology of learn- 
ing suggest that accident proneness might be 
less constant with inexperienced workers. 
Research on time-intervals between accidents 
for inexperienced workers is worth attempt- 
ing. 

Our finding is not in agreement with Horn’s 
conclusion that accident Susceptibility is 
temporarily increased by accidents, This 
disagreement with Horn’s conclusion may 
represent a difference between different kinds 
of accidents, since his data dealt with air- 
plane accidents and ours with accidents to 
taxi-drivers. On the other hand, the discrep- 
ancy may be due to different Statistical treat- 


ments. Possibly a statistical artifact was 
involved in his conclusions. Horn’s tables 
show 


apparently not 
tribution of the 


Summary 


Much of the evidence 
monly accepted hypothe 
ferences in accident pron 
one assumes that accide 
viduals is not affected 


in favor of the com- 
sis of indivdual dif- 
eness is only valid if 
nt proneness of indi- 
by accidents in which 


Alexander Mintz 


they are involved. The validity of this as- | 
sumption is investigated in terms of a study | 
of time intervals between consecutive acci- 
dents of a number of taxi-drivers. Some fea- 
tures of the relevant mathematical theory of 
the random distribution of events in time are 
reviewed. The findings pertaining to the - 
time intervals between accidents suggest, | 
that, for the group studied, the customary 

assumption of unchanged accident proneness 
following accidents is approximately true. 


Received December 28, 1953. 


References 


1. Cobb, P. W. The limit of usefulness of aa 
dent rate as a measure of accident prone! 
J. appl. Psychol., 1940, 24, 154-159. | feo 
2. Greenwood, M. The statistical study of 105, 
tious diseases. J. Roy, Stat. Soc, 1946 
85-109, inci- 
3. Greenwood, M. and Woods, H. M. „Ihe als 
dence of industrial accidents upon E 
with specific reference to multiple acci 
Industr. Fatigue Res, Bd, Rep't 4, 1919. uiry 
4. Greenwood, M, and Yule, G. U. An, a 
into the nature of frequency distribution’ i ar 
resenting multiple happenings with pi 
reference to the occurrence of multip e 
tacks of disease or of repeated accidents. 
Roy. Stat. Soc., 1920, 83, 255-279. d acci- 
5. Horn, D. A study of pilots with repeate 449. ' 
dents. J. Aviat. Medic, 1947, 18, 44 yule’s | 
6. Irwin, J. O. Discussion on Chambers and pl)» 
Paper. J. Roy, Stat. Soc., 1941, 7 (sup 
101-107, 
7. Irwin, Js Qi Discussion on Professor 
Wood's paper. J, Roy. Stat. Soc, 19 
107-108, und: 
8. Kerrich, J. E, The mathematical backst? 


‘Accident 
In Arbous, A. G. and Kerrich, J. E» one- 
Statistics a 


! 
s ri 
nd the concept of accident P Re- 
ness, 


| 


Green” 
46, 109 


National Institute for” Persone unci 
search No. 391, 1951, South African 
for Scientific and Industrial Research: a, At 
9. Maguire, R, A., Pearson, E, S„ and Woristriat 

H. A. The time interval between 1” 


. 0. j 
accidents, Biometrika, 1952, 39, 168-18 pilit 
10. Mintz, A, 


The inference of accident ye 

from the accident record. J. appl. PS 
1954, 38, 41-46. ination O 

11. Mintz, A. and Blum, M. L. A reexamin? | PI 
the accident proneness concept. J. apt l 
chol., 1949, 33, 195-211. ag 


5 f 
12. Moran, P. A. P. The random divisio? ppl)? F 
interval. J. Roy, Stat. Soc., 1947, 9 $S p 
92-98. 


, pep” Ay 
13. Whitworth, W, A. Choice and chance- wy 
of Sth edit. Stechert, N, Y., 1934- | 


| 


Tue Journ. pi i: 
Vol. 38. Baan OT AERD PSYCHOLOGY 


Attitudes on Social Issues of Business Administrators and 
Students in a School of Business Administration 


Alexis M. Anikeeff 
Oklahoma A & M College 


pats enrolled in a School of Business 
È ministration generally plan to establish 
areers in business organizations, and aspire 


it . ie 
_ to become business administrators. Although 


See effort is expended in relating 
i student interests to future job suc- 
on the relationship of attitudes on signifi- 
= social issues and future job success is 
poeren largely ignored. The general pur- 

se of this study is to explore the possibility 


bi $ aim g 
‘iat student attitudes on social issues may 


` Ser RA 
Serve as useful measures for prediction of suc- 


Cess i A 
S in various fields of endeavor. The spe- 


lci A ; 
ific purpose of this study is to measure the 


e : : ial 

oa to which attitudes of business adminis- 

os differ from those of students in a 
ool of Business Administration. 


4 Procedure 


aidembers of a seminar distributed question- 
admini containing 40 statements to 78 business 
dents istrators and to 146 business school stu- 
Yes ¢ Respondents were forced to reply either 
dealt r no to each statement. Five statements 
rol a unionism, 10 with government con- 
tribes with personnel policy, 5 with profit dis- 

with? 4 with the free enterprise system, and 
the the desirability of business training on 

College level. 

Base oes administrators were selected on the 
Willing. the size of their establishments and their 
tion ness to cooperate. The largest organiza- 
bers 1 the general vicinity of the seminar mem- 
mi home was contacted first. When the top 
,nistrative officer was unavailable, the ques- 
naire was completed by an individual who 


Wa; 
S second in command. About 80% of the 


í ad, contacted employed less than 25 persons 


a of the firms were located in Mississippi. 
Missis student sample was composed entirely of 


` roleg PPE State College students who were en- 


Ppro in the School of Business and Industry. 
Class ximately 60% of the students were upper- 
torg men, As in the case of business administra- 
Sty ae student sample is composed entirely of 
Sirvey who were willing to cooperate with the 


407 


Results, 


Significant or very significant differences 
between responses of business administrators 
and students of business administration were 
found on 20 of the 40 statements contained 
in the questionnaire. Specific details are 
found in Table 1. The statements are num- 
bered in accordance with their appearance in 
the questionnaire. The significance of the 
difference between percentages was estimated 
from the Lawshe and Baker nomograph.* 

It is noteworthy that significant differences 
between responses of administrators and stu- 
dents were found on every government con- 
trol statement which appeared in the ques- 
tionnaire. In all cases the students professed 
to be more favorably disposed toward gov- 
ernment regulation and control than the ad- 


ministrators. 
Discussion 


Although a mated divergence of attitudes 
is indicated on one-half of the issues pre- 
sented to both students and the administra- 
tors, the effect of this situation upon the 
future success of the students in their roles 
as administrators is not clearly discernible. 
Some shifting of attitudes may take place 
when students are placed on the job. There 
is also the possibility that some shifting of 
attitudes may occur on the part of adminis- 
trators. In the event that student attitudes 
toward government control will not subject 
the students to unfavorable discrimination by 
present business administrators, it may be 
reasonable to believe that the forthcoming 
generation of administrators will be less prone 
to believe that the whole American economy 
will collapse with the further extension of gov- 
ernmental influence in business affairs. 

1 Lawshe, C. H. and Baker, P. C. Three aids in 


the evaluation of the significance between percent- 
ages. Educ. psychol. Measmt., 1950, 10, 263-270. 


408 Alexis M. Anikeeff 
Table 1 
Distribution of Responses to Items in Questionnaire 
Per Cent Replying Yes 
Adminis- i 
Item Student trator Diff. 
15%* 
1. Business should receive government subsidies. tet ett eee eee eee nes 19% 17* 
3. Corporations should be taxed higher than individuals. 63 25% 
4. Price control will destroy the free enterprise system. pja me ss ert aa 60 
7. Workers will do their best work only if strict discipline is maintained 16" 
Dy CHE SUPELVISON xis ties wieesyeisceiaia es He ary aie eter eee 30 46 12* 
9. Employees should have company sponsored retirement Aan ARN 84 72 306% 
11. More jobs should be covered by the minimum wage law 40 ae 
13. Labor unions help industrial progress.................0000-0000... 60 
15. A worker who is “‘no good” on one job will probably be “no good” on We 
AD STORHE onana a OAS die spe ayoraienriv fetteeeee 10 24 1s** 
16. The federal government should subsidize educational institutions... . . 58 40 
23. A Fair Employment Practices Commission should be established in 13** 
MISSISSIPPI. o us saaan eaaa as inaa as Hb Ha sintineremnnae on cis oe 5 55 33 13" 
30. Shifty-eyed persons are dishonest.......... 18 33h 
31. Unemployment benefits should be abolished. . . 36 m 
32. People with red hair are emotionally unstable, . . 13 16" 
33. You can tell a person’s intelligence by interviewin him 42 i 
i 10 
35. Government old age pensions should be i C s M 23 
36. There should be absolutely no government control or regulation of pri- 16" 
vate business 45 oe 
37. Labor unions will destroy the free enterprise system................ 24 40 1 
38. Government should compete with private business whenever the public 5* 
E T A A atissisijcumemmeumccnn: 69 54 í 
39. Profits resulting from increased productivity should be divided equally et 
among stockholders, labor, and the CONSUIMELS an ionni 28 64 ii 
40. You cannot have democracy without the free enterprise system. ..... 73 53 20 


* Indicates 5% level of confidence. 
** Indicates 1% level of confidence. 


Summary 

An attitude survey blank containing 40 
statements and covering the areas of govern- 
ment control, personnel policy, profit distribu- 
tion, labor unionism, and the free enterprise 


system was completed by 78 business ad- 
ministrators and 146 business administration 
students. 

1. Significant differences were found on 20 
of the 40 items contained in the question- 
naire between responses of the two groups. 


2. Disagreement was greatest in the ee 
of government control. Students were io 
nificantly more favorably disposed to 8° 
ernment control than the administrators. 

3. Despite the marked divergence bet vty 
attitudes of the two groups, the possibi tt 
exists that some student attitudes may $ ors 
toward those professed by administrat? ? 
when the students are forced to solve Land 
lems presently faced by the administrate 


Received November 25, 1953. 


Tue JOURNAL O : 
Vol. 38, No. 6, Fee PSYCHOLOGY 


The Guilford Zimmerman Temperament Survey as a Predictor 


of Achievement Level and 


Achievement Fluctuation in 


Introductory Psychology 


A. W. Bendig and J. L. Sprague 


University 


ient achievement in college courses 
= multiple measurements are taken over 
aid ais inyolyes two aspects: consistency 
fhe Rego Inter-student differences in 
Secale level of course achievement over 
oer examinations are used as the basis for 
ately pe course grades and result in moder- 
ae regen gene indices (1). How- 
evels j e reliability of average achievement 
is less than perfect due to intra-student 
‘hago in performance from test to test. 
sistent (4) has shown that measures of incon- 
mod responses in a retest situation have a 
ieee but significant degree of reliability 
Suggests that measures of achievement 
“phen within a course may reliably meas- 
Gent» important aspect of student achieve- 
ir performance. 
Ti suggesting that achievement fluc- 
ine ti 1S particularly important in attempt- 
e ee achievement level. The usual 
Bade ional procedure of averaging scores or 
final S on several course tests to arrive at a 
achie course grade implies that measures of 
ion vement level and of achievement fluctua- 
Stud will be related in a nonlinear manner. 
kes receiving A course grades are most 
Kine to have shown consistent A or B per- 
Bra Vega on each test and those receiving F 
each S to have achieved at D or F levels on 
inch at However, the C or middle group 
Sine es both students who have received 
istent C grades on each test and also 
se who have fluctuated widely from A to 
On separate examinations, but whose aver- 
Se grade ends up asa C. We would predict 
curvilinear relationship between measures 
fon nievement level and achievement fluctua- 
sh oo the middle achievement level groups 
the 28 the largest average fluctuation and 
extreme level groups (high and low) 


tiop Strating significantly smaller fluctua- 


of Pittsburgh 


If this analysis is correct it then bears im- 
portantly on the problem of predicting stu- 
dent achievement in the first course in psy- 
chology. Aptitude test scores have shown 
moderate, but important rectilinear relation- 
ships with achievement level (2, 9, 10), but 
attempts to use personality and interest 
scales to predict level when aptitude is sta- 
tistically held constant have been fruitless 
(7, 8). This lack of success may be due to: 
(a) curvilinear relationships existing between 
such scales and achievement level which are 
not revealed by rectilinear correlation tech- 
niques; or (b) these personality and interest 
scales being predictive of only the same type 
of student behavior that is predicted by apti- 
tude tests. The first of these hypotheses 
could be tested by computing curvilinear cor- 
relation coefficients (eta or epsilon) in addi- 
tion to rectilinear Pearsonian coefficients and 
applying standard tests of significance to the 
difference between the pairs of rectilinear and 
curvilinear coefficients. Hypothesis (b) could 
be assessed by correlating the personality 
and/or interest scales with aptitude tests 
known to be related to the achievement cri- 
terion and finding the partial correlation of 
scales and achievement with aptitude sta- 
tistically held constant. However, our as- 
sumed relationship between achievement level 
and fluctuation suggests a third hypothesis: 
(c) a predictor may þe rectilinearly related 
to both level and fluctuations, but because of 
the curvilinear confounding of level and fluc- 
tuation may show a zero correlation with 
level. This third hypothesis could be tested 
by correlating each scale with measures of 
both level and fluctuation and tempering our 
judgments of nonsignificant scale-level cor- . 
relations in the light of obtained scale-fluctua- 
tion relationships. 

Several recent stu 
the Guilford Zimmerman 


409 


dies have suggested that 
Temperament Sur- 


410 


vey (5) may be useful in predicting stu- 
dent achievement in introductory psychology. 
Klugh (6) found three scales on the GZTS 
to correlate positively and significantly with 
total scores on the ACE. For a sample of 
225 male students the Objectivity and Friend- 
liness scales were significant at the .01 level 
(r = .19 and .18), while the Personal Rela- 
tions scale was significant at the .05 level 
(r= .14). Of the remaining seven scales, 
only the Masculinity scale approached sig- 
nificance (r = .11). Since the ACE is sig- 
nificantly related (R = .47) to achievement 
level in introductory psychology (10) these 
GZTS scales could be expected to show some 
correlation with the same achievement vari- 
able. However, Krumm (8), using the ACE 
as a predictor, obtained discrepancy scores 
between the predicted and obtained grades 
in introductory psychology (N = 410) and 
identified the top and bottom quarters of the 
resulting distribution. Comparing the mean 
GZTS scores of these groups of “over- 
achievers” and “underachievers” showed none 
of the GZTS scales to significantly discrimi- 
nate between these extreme groups. 

The problem of the present research was to 
compare the relation of GZTS to achievement 
level in introductory Psychology when both 
rectilinear and curvilinear correlation tech- 
niques are used, and to make a similar com- 


parison when achievement fluctuation is used 
as the criterion, 


Procedure 


a students enrolled 
l C roduct 

at the University of Pittsburgh mins ioe 
1953, semester and who w 


The Guilford Zimmerman Te: 3 
ment Survey (5) was administered to af pe. 
tions near the beginning of the semester by 
trained examiners.1 Raw scores on each of the 
ten GZTS scales were used in the later analysis 

The achievement variables were derived from 
students’ scores on five course achievement ex- 
aminations given during the semester. All five 


Frederick Herz- 
Pittsburgh who 
scoring of the 


1 Appreciation is expressed to Dr. 
berg of Psychological Services of 
supervised the administration and 
temperament survey. 


A. W. Bendig and J. L. Sprague 


tests were objective, 50-item, multiple-choice ex- 
aminations and the raw scores from each test 
were converted to standard scores based upon 
the performance of all students in introductory 
psychology on each single test. The large ma- 
jority of students (N = 126) took all five tests 
during the semester, but graduating seniors h 
= 22) were excused from the last two tests. The 
remaining students (N = 7) had missed one s 
the tests and had received an incomplete grade 
in the course, but were retained in the, present 
sample. Details of the course evaluation prO 
cedure have been previously described (1). his 

The variable of achievement level used in oe 
study was the letter grade received by each di 
dent. These grades were determined by finding 
the mean of each S’s test standard scores ie 
converting this average to a letter grade on t F 
basis of previously established cutting pona 
which were common to all sections. This achiev n 
ment level variable has been shown to have 2 
estimated reliability of .80 (1, p. 316). es 

The achievement fluctuation variable was a 
tived from the range between the highest and a 
est test scores received by each S over the sem 
ter. Dixon and Massey (3, pp. 240-241) ny 
shown that the range in small samples is a hig z 
efficient estimate of the population variability i 
1s computationally much simpler than compl rhe 
the standard deviation of each S’s scores. i 
reliability of this fluctuation measure was. & Ss 
mated by drawing a random sample of 1 f e 
who had taken all five course tests. The rana 
was computed for each § as the difference nd 
tween his highest and lowest scores and a sec? 
measure of the fluctuation was obtained by and 
ing the range between the S’s second highest ese 
second lowest test scores. Correlating ee 
pairs of fluctuation measures for the 100 Ss Sie 
a coefficient of .59, which indicates that. ve- 
range is a relatively stable measure of abe f 
ment fluctuation, Finally, the range for ean by 
the 155 Ss in the study sample was multiplies pP 
the constants given by Dixon and Massey via 
240) to give an estimate of the standard 4¢ 
tion of each S’s achievement test scores. 


Results 


_ The distribution of achievement wee 
tion measures appeared positively skewed faj- 
Was tested for normality by grouping ap 
vidual measures into six categories and are 
plying the usual chi-square test. Chisa 
equalled 9.94 which with 3 degrees of fe- 
dom was significant at the .05 level. ut 
duce skewness these measures were the 
through a square-root transformation 4° alit: 
transformed distribution tested for no! chi 


by the usual chi-square method. 


P eS — EEE Oe eee 


Guilford Zimmerman 


square value with three degrees of freedom 
was 5.60 which is not significant at the .05 
level of confidence. The mean and variance 
of the fluctuation measures were computed 
for each of the five achievement level groups 
and an analysis of variance performed to test 
the hypothesized curvilinear relationship be- 
tween level and fluctuation. The F compari- 
son between the means gave an F value of 
4.03 which is significant at the .01 level. 
This F corresponds to a curvilinear correla- 
tion (eta) of .31, while the product-moment 
rectilinear correlation between level and fluc- 
tuation gave a nonsignificant 7 of — .15. A 
chi-square test of the significance of the dif- 
er? between their curvilinear and recti- 
aed coefficients gave a chi-square of 12.40, 

ich, with 3 degrees of freedom, is significant 
at the .01 level. A chi-square test of the ho- 
Mogeneity of the variances of the fluctuation 
Measures within the five achievement level 
Stoups yielded a value of 5.61 which, with 
four degrees of freedom, is not significant at 
.05 level. The mean fluctuation for each 
a the achievement level groups can be found 
` Table 1. As hypothesized, the middle 
achievement level groups show the largest 
Verage fluctuation and the extreme level 
8toups (A and F) demonstrated significantly 
ess average fluctuation. 

Since our two measures of achievement, 
€vel and fluctuation, are nonlinearly related, 


Table 1 


Distribution of Achievement Fluctuation Measures 
for the Achievement Level Groups 


Achieve- 
ne By Numberof Fluctuation Standard 
vel Subjects Mean Deviation 
A 21 3.12 .91 
B 32 3.68 .91 
e 62 3.83, 88 
D 26 4.05 60 
D 14 349 92 
Significance 
of Means (F) 4.03"* 
©mogeneity 
ariances 
chi-square) 5.61 


ey Sign; 
ignificant at the .01 level. 


Temperament Survey 411 
it was necessary to perform a further trans- 
formation on the fluctuation measures to 
clarify the relation of GZTS scales to these 
two criteria. To insure a zero correlation be- 
tween level and fluctuation each S’s fluctua- 
tion measure was taken as a deviation (plus 
or minus) from the mean fluctuation of his 
achievement level group. Since the variances 
of the fluctuation measures within the level 
groups appeared to be homogeneous, the fur- 
ther step of dividing each deviation by the 
standard deviation of its level group was not 
necessary. These deviations from all five 
level groups were then pooled and divided 
into five fluctuation groups. Group I in- 
cluded the 20 Ss showing the largest intra- 
subject achievement fluctuation (independent 
of achievement level), Group V comprised 17 
Ss with the smallest intra-subject fluctuation, 
with Groups II, III, and IV consisting of Ss 
showing intermediate amounts of fluctuation. 

Raw scores on the ten GZTS scales were 
then correlated with the criterion measures 
of achievement level and achievement fluctua- 
tion. Rectilinear product-moment correlations 
were computed by weighting achievement 
level groups A through F with unit digits 4 
through O and similarly weighting achieve- 
ment fluctuation groups I through V with the 
same weights. These weights were then cor- 
related with the raw GZTS scores. In addi- 
tion, curvilinear correlations (eta) between 
the GZTS scales and the two criterion meas- 
ures were computed and chi-square tests of 
the significance of the difference between the 
rectilinear and curvilinear coefficients evalu- 
ated. These correlations and tests of curvi- 
linearity are given in Table 2. It can be 
noted that the GZTS Objectivity scale is 
rectilinearity related to achievement level and 
this is probably also true of the Restraint 
scale. Friendliness and Masculinity are re- 
lated to level in a curvilinear fashion, but the 
product-moment coefficients for these two 
scales are not significant. None of the GZTS 
scales are related to achievement fluctuation 
when only the product-moment coefficients 
are considered, but Ascendance, Social Inter- 
est, and Emotional Stability are curvilinearly 
related to fluctuation. 


412 


A. W. Bendig and J. L. Sprague 


Table 2 


Rectilinear and Curvilinear Correlations between Guilford-Zimmerman Scales and 
Achievement Level and Fluctuation 


Achievement Level 


Achievement Fluctuation 


Product- 


seit e 
Significance Product- i ae 
Moment Curvilinear of Differ- Moment Curvilinear o BIA 
Correlation Correlation ence Correlation Correlation (Chi-Square) 
GZTS Scale (r) (Eta) (Chi-Square) (r) (Eta) 
83 
General Activity —.13 18 2.55 =02 18 a 
Restraint .20* .24* 2.56 —.03 08 18.08" 
Ascendance —.13 19 2.80 00 35** oa 
Social Interest —.14 .20 2.98 Al .27* an 
Emotional Stability .13 .17 2.13 05 24* ae 
Objectivity iF .28* 5.34 03 07 E 
Friendliness Al .27* 9.16* 15 17 r 
Thoughtfulness —.02 04 0.18 09 14 oer 
Personal Relations -08 .19 4.68 12 AT r 
Masculinity T .25* 5.79 ~.09 12 y 


* Significant at the .05 level. 
** Significant at the .01 level. 


Discussion 


Our results confirm the hypothesis of a sig- 
nificant curvilinear relation between achieve- 
ment level and achievement fluctuation. This 
indicates that our level criterion is an impure 
measure of achievement differences between 
students and Suggests that similar criteria 
used widely in educational rese 
larly contaminated 
able. Nor is 
chance phenomenon: 
nificant correlation bet: 
ures of fluctuation ¢ v. 
study shows its reliability. 


The correlations in Table 2 bear on points 


(a) and (c) made in the third paragraph of 
this paper. Two GZTS scales, Restraint and 
Objectivity, are rectilinearly related to our 
contaminated level criterion, but two addi- 
tional scales, Friendliness and Masculinity, 
show insignificant rectilinear, but significant 
curvilinear correlations with level. This con- 
firms point (a), since neither of these last 
two scales would have appeared to be re- 
lated to level if curvilinear correlation tech- 
niques had not been used. However, point 
(c), as expressed in the second paragraph, is 
not confirmed, since none of the GZTS scales 


are rectilinearly related to fluctuation. The 


a oe i three 
significant curvilinear correlation of 


GZTS scales, Ascendance, Social Interest, 2 
Emotional Stability, with fluctuation Omit: 
that point (c) is too naively stated. (c) 
ting the word “rectilinearly” in point ible 
yields a hypothesis that appears prance 
in view of our findings. Perhaps these tion- 
GZTS scales show essentially zero e þe- 
ships with the impure criterion of ps 
Cause of their curvilinear correlation wit ults 
pure measure of fluctuation. These the a 
do not confirm point (c), but suggest t e. 
modified form of the hypothesis is tena jrect 

The available data did not permit @ sug 
test of point (b). However, there are ? g- 
gestive consistencies and discrepancies 1 
tween our results and previous studies glu gh 
10) that indirectly bear on this point. a an! 
(6) found the Objectivity, Friendliness, ntl 
Personal Relations scales to be signif od 
telated to the total score on the ae q 
the Masculinity scale to fall just short © 
tistical significance, Russell and ape en 
demonstrated a significant relation vera 
the ACE and our level criterion, while 5 redic” 
(8) showed the GZTS scales were not P psy” 
tive of achievement level in introductory ut- 
chology when the variability in level at elim 
able to ACE differences was statistically 


Guilford Zimmerman 


nated. We found the Restraint, Objectivity, 
Friendliness, and Masculinity scales signifi- 
cantly related to level when academic apti- 
tude is uncontrolled. These results suggest 
that the Friendliness and Objectivity scales 
on the GZTS measure the same aspects of 
achievement performance that is measured by 
the ACE and could not profitably be used in 
a regression equation along with the ACE to 
Predict achievement level in introductary psy- 
chology. However, the Restraint and Per- 
Sonal Relations scales probably would in- 
crease the predictability of level if used in 
Conjunction with the ACE: the Restraint 
Scale because of its significant correlation 
with level and its low correlation with ACE, 
while the Personal Relations scale could act 
aS a suppressor variable due to its lack of 
Correlation with level and its significant re- 
lation to the ACE. Admittedly the predic- 
tive usefulness of these two scales is un- 
Proven, but provides a hypothesis to be 
tested in a later sample. 


Summary 


Scores on the Guilford-Zimmerman Tem- 
Perament Survey were correlated by both 
Tectilinear and curvilinear methods with meas- 
ures of course achievement level and intra- 
Student achievement fluctuation in introduc- 
tory psychology (N = 155). Achievement 
evel and fluctuation were curvilinearly re- 
‘ated and the fluctuation measures were ad- 
Justed to remove this artifact. Two GZTS 
Scales, Restraint and Objectivity, were recti- 
Nearly related to level (r = .20 and .21), 
While two additional scales, Friendliness and 

asculinity, showed significant curvilinear 
Correlations with level (eta = .27 and .25). 


Temperament Survey 413 


None of the GZTS scales were rectilinearly 
related to fluctuation, but three scales, As- 
cendance, Social Interest, and Emotional Sta- 
bility, were curvilinearly correlated with fluc- 
tuation (eta = .35, .27, and .24). 


Received November 27, 1953. 


References 


m. 


. Bendig, A. W. The reliability of letter grades. 
Educ. psychol. Measmt., 1953, 13, 311-321. 

2. Carlson, H. B., Fischer, R. P., and Young, P. T. 
Improvement in elementary psychology as re- 
lated to intelligence. Psychol. Bull., 1945, 42, 
27-34. 

3. Dixon, W. J. and Massey, F. J. Introduction 
to statistical analysis. New York: McGraw- 
Hill, 1951. 

4. Glaser, R. The reliability of inconsistency. 
Educ. psychol. Measmt., 1952, 12, 60-64. 

5. Guilford, J. P. and Zimmerman, W. S. Guil- 
ford-Zimmerman Temperament Survey Manual 
of Instructions and Interpretations. Beverly 
Hills, California: Sheridan Supply Co., 1949. 

6. Klugh, H. E. The relationship between some 
aspects of temperament and academic apti- 
tude. (Unpublished study.) 

. Klugh, H. E. The prediction of academic 
achievement from measures of personality. 
Unpublished master’s thesis, Univer. of Pitts- 
burgh, 1952. 

8. Krumm, R. L. Interrelationships of measured 
interests and personality traits of introduc- 
tory psychology instructors and their stu- 
dents as related to student achievement. Un- 
published doctor’s dissertation, Univer. of 
Pittsburgh, 1952. 

9. Newman, S. E., Duncan, C. P., Bell, G. B., and 
Bradt, K. H. Predicting student performance 
in the first course in psychology. J. educ. 
Psychol., 1952, 43, 243-247. 

10. Russell, H. E. and Bendig, A. W. Student rat- 

ings of instructors and course achievement 

with academic aptitude controlled. Educ. 

psychol. Measmt., 1953, 13, 626-635. 


x 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. J No. 6, 1954 


Proposed Hostility and Pharisaic-Virtue Scales for the MMPI * 


Walter W. Cook 


University of Minnesota 


and 


Donald M. Medley 


Indiana University 


This article describes an attempt to de- 
velop scales for the Minnesota Multiphasic 
Personality Inventory (MMPI) which meas- 
ure a person’s ability to get along well with 
others. Such scales should prove valuable in 
selecting personnel who must deal with the 
public or work harmoniously and effectively 
with a group. The validity of the scales re- 
ported here is based on their power to predict 
the rapport of teachers with pupils in a class- 
room. Since the content of the items is not 
directly related to school work, it is believed 
that the scales may prove useful also in the 
selection of sales people, officers and non- 
commissioned officers in the armed forces, 
foremen, and other personnel who must be 
able to establish Tapport with others and 
The scales are pub- 
to encouraging fur- 
ther situations. 

a series of research 
University of Min- 
n with the isolation 


was standardized on a large sample of Minne- 
sota teachers ( 1), it was Possible to identify, 
in the extremes of the distribution, t ; 


WO groups 
of teachers sharply differing in their ability 
to get along with pupils. The MMPI (3) 


was administered to these two groups, and 

* This study was made Possible b 
the research funds of the Graduat 
University of Minnesota. 


y a grant from 
€ School of the 


414 


212 completed inventories, 112 representing 
approximately the 8 per cent of teachers $ Ae 
ing highest, and 100 the 8 per cent seor S 
lowest (among all of the public school Ba 
ers in Minnesota) on the MTAI obtaine ue 
The MMPI contains 550 items of the True 
False type with a wide variety of co 
When the proportions making each resp e 
in each of the two groups of teachers Y Jes 
compared, after being transformed to ane 
by the arc-sine transformation (5), Ke e 
ference between the groups was foun 50 
significant at the 5 per cent level on 
items. I 
The teacher who scores low on the ue 
describes himself in his responses (2) a5 erat 
erally hostile toward others; he says thy) 
pupils are dishonest, insincere, untrustWr te 
lazy, etc. His self-description also nie 
that he: (a) adheres excessively tO inati 
standards of morality; (b) tends to dom ose 
those below him and be subservient tO thor- 
above him; and (c) prides himself on ^ mong 
ough knowledge of his subject-matter. A were 
the 250 discriminating MMPI items 0° 
many which reflected generalized hostility ti 
ward people, and others that suggested . h 
saic virtue. There were no items W dt p 
flected the tendency toward security tht ub” 
power over people or through mastery n em 
Ject matter. The other discriminating jet 
Suggested symptoms of depression, 4 
and general neurosis, jous) 
A total of 77 items which most 0bYi0 ie- 
reflected hostility were chosen for “ae to 
liminary “Ho” scale, and 60 items baa for 
do with virtue and morality were chos nsw 
a “Pv” scale. When the MMPI ents uy 
Sheets completed by 200 graduate stude! peed 
education (all of whom were exP a two 
classroom teachers) were scored on 


>- M 


-= 


A 
A 


— a 


Hostility and Pharisaic-Virtue Scales for MMPI 


keys, correlations of — .45 (for “Ho”) and 
— 49 (for “Pv”) with the MTAI were ob- 
tained. On the strength of these results, fur- 
ther refinement of the scales was undertaken. 
Five clinical psychologists, working independ- 
ently, selected sets of “Ho” and “Py” items. 
On the basis of agreement among the five, 
a final 50-item Ho key was selected. The 
teliability coefficient of this scale for the 200 
graduate students, estimated by analysis of 
variance (4), was .86. 

Substantial agreement among the psycholo- 
8ists could be obtained on only 20 of the Pv 
items. On the basis of an internal consist- 
ency item analysis carried out on the 200 
Staduate students, 30 items were added to 
Produce a 50-item Pv key. The internal con- 


Table 1 
Relationships Among Ho, Pv, and MTAI Scales 


Correlation Males Females Total 

Coefficients N=100 N=100 N= 200 
Ho vs. MTAT — 44 — 45 —44 
Pv vs, MTAT —38 —54  —<A6 


Ho vs. Py 65 73 69 


avs, MTAI —.45 —.54 =0 
Multiple R, Py + Ho 

vs. MTAI Regression 

Coefficients —.46 =.55 —.51 

eta weight for Ho —.335 —.109 

eta weight for Pv —.163  —.463 


Sistency of this scale could not be estimated 
On the 200 papers used in the item analysis, 
SO the papers of 55 other graduate students 
education were used for this purpose. The 
reliability, estimated by analysis of variance, 
Was .88, 

Direct evidence regarding the validity of 

© Ho and Py scales for predicting pupil- 
teacher rapport as measured by the MTAI, 
and indirect evidence as to their validity for 

€asuring “Hostility” and “Pharisaic virtue,” 
Was obtained by correlating the scores of the 
wo graduate students on the two scales with 
heir scores on the MTAI. The results are 
Summarized in Table 1. 
i he sample contained 100 males and 100 
“males; correlations and beta weights are 
Presented for the two sexes separately in the 


415 


Table 2 


Items Included in the Ho (Hostility) Key for the 
Minnesota Multiphasic Personality Inventory * 


(Listed according to number on the Group Form) 


19 136 265 386 455 
28 148 271 394 458 
52 157 278 399* 469 
59 183 280 406 485 
71 226 284 410 504 
89 237* 292 411 507 
93 244 319 426 520 
110 250 348 436 531 
117 252 368 438 551 
124 253* 383 447 558 


*Items marked with an asterisk are keyed “False”; 
all other items are keyed “True.” 


first two columns, and correlations for the en- 
tire sample in the third column. 

The Ho scale tends to be more effective for 
males than the Pv scale, while the reverse 
holds for females, although none of the sex 
differences is statistically significant. In the 
multiple regression equation for predicting 
MTAI scores from Ho and Pv scores for 
males, the addition of the Pv scale does not 
significantly improve on the prediction from 
the Ho scale alone. In the multiple regres- 
sion equation for predicting MTAI scores 
from Ho and Pv scores for females, the addi- 
tion of the Ho scale does not significantly im- 
prove on the prediction from the Pv key 
alone. ’ 


Table 3 


Items Included in the Pv (Pharisaic Virtue) Key for the 
Minnesota Multiphasic Personality Inventory * 
(Listed according to number on the Group Form) 


13 147 356 401* 468 
26 158 357 402 470 
30* 176* 361 404 492 
45* 206 375 413 499 
58 232 378 414 502 
94 289 380 416 506 
111 317 390 439 509 
112 336 392 443 510 
119 337 395 457 548 
129 338 397 461 564 


* Items marked with an asterisk are keyed “False”; 
all other items are keyed “True.” 


416 Walter W. Cook and Donald M. Medley 


Table 4 
Norms for Hostility (Ho) Scale of MMPI 


T Score T Score ies T Score 
Raw — Raw — Tor > F 
Score M F Score M F - 
71 16 46 4 

ry a J 3 A 70 15 H z 
48 90 92 31 66 68 14 p 3 
47 8 91 30 65 67 13 u 
46 8&7 89 29 64 66 12 AO D 
45 8&6 88 28 62 6 11 39 = 
44 84 87 27 61 63 10 37 a8 
43 83 85 26 59 61 9 36 = 
42 8&2 84 25 58 60 8 35 pe 
41 80 82 24 57 59 7 33 z 
40 79 84 23 55 57 6 2 g 
39 77 80 22 54 56 5 30 5 
38 76 78 21 53 54 4 w 2 
= i m 20 5133 3 _— 
36 73 75 19 50 52 2 26 7 
35 72 74 18 48 50 1 25 25 
34 71 73 17 47 49 0 23 


-¢ for 
The correlations obtained with multiple re- the best predictor of teacher-pupil rapport fy 
gression weights on both scales combined are both sexes is desired, the Ta scale W 
practically identical with those obtained when probably be the most satisfactory. pe- 
the two scales are throw The magnitude of the intercorrelation n 
100-item “Ta” (teacher tween the two scales is enough smaller 


n together into one 
attitude) scale, If 


Table 5 
Norms for Pharisaic-Virtue (Pv) Scale of MMPI 
T Score T Score 
Raw Raw Tiie Raw = 
Score M F Score M F Score M pee 
50 9 91 33 7 3 A 
49 98 90 32 A = y 45 38 
48 96 88 31 70 63 14 ao 
47 94 87 30 68 62 13 a 3 
46 93 85 29 66 60 12 o 3 
45 9% s4 28 65 59 ul 33 
44 9 82 27 63 s7 10 37 32 
43 8 81 26 2 = A s 2 
2 87 79 23 60 34 A 34? 
41 8 78 24 59 53 7 a oz 
40 84 76 23 5 s1 6 31 a 
39 8&2 75 22 55 s0 5 29 ms 
38 80 73 21 54 48 4 27 GA 
37 7 72 20 52 47 3 26 = 
36 7o mo 19 51 45 2 24 s 
35 76 69 18 49 44 1 23 5 
34 74 6 17 48 49 0 a} 


Hostility and Pharisaic-Virtue Scales for MMPI 


their reliabilities to suggest that they are 
measuring different, although highly related, 
dimensions of personality. 

Lists of the items included in the two scales 
are presented as Tables 2 and 3, all items be- 
ing keyed “true” except those marked with 
an asterisk. The numbers given are those on 
the group form of the MMPI (3). A key 
for either of the scales may be easily pre- 
pared by making a scoring stencil perforated 
as indicated in these tables. 

If it is remembered that these items repre- 
Sent the individual’s own description of him- 


417 


self, some insight into the personality of the 
individual who scores high on one of these 
scales may be obtained by reading the items. 

Typical items on the Ho scale are the fol- 
lowing: “I would certainly enjoy beating a 
crook at his own game,” “When someone does 
me a wrong I feel I should pay him back if I 
can, just for the principle of the thing,” “I 
have often met people who were supposed to 
be expert who were no better than I.” Thus 
revealed, the hostile person is one who has 
little confidence in his fellowman. He sees 
people as dishonest, unsocial, immoral, ugly, 


Table 6 
Norms for the Teacher Attitude (Ta) Scale (Ho plus Pv Scales) of MMPI 
T Score T Score T Score 
Raw — Raw —— Raw Painii 
Score M F Score M F Score M F 
100 100 98 66 3 n 33 46 44 
99 9 97 65 72 70 32 45 44 
98 9 96 64 71 69 31 45 43 
97 98 95 63 70 68 30 4 4 
96 97 95 62 70 67 29 83 4 
95 96 94 61 69 67 28 42 40 
94 95 93 60 68 66 27 41 40 
93 95 92 59 67 65 26 41 39 
92 94 91 58 66 64 25 40 38 
91 93 91 Cy 66 64 24 39 37 
90 92 90 56 65 63 23 3 36 
89 9 89 55 6 02 22 37 36 
88 9 88 54 6B ól 21 af 85 
87 90 87 53 62 60 20 36 34 
86 89 87 52 62 60 19 35 33 
85 88 86 51 61 59 18 34 3 
84 87 85 50 60 58 17 3 32 
83 86 84 49 59 57 16 33 = 
82 86 83 48 58 56 15 ? 2 
81 85 83 47 58 56 14 3 2 
80 84 82 46 57 55 13 30 no 
79 g3 81 45 56 54 12 2 28 
78 82 80 44 55 53 11 2 = 
17 8&2 79 43 54 52 10 
42 53 52 9 a7 Z 
is a n 31 8 25 25 
75 80 78 41 53 5 
40 52 50 7 25 24 
74 2 m 39 si 49 6 24 83 
7 B 38 50 48 5 3 22 
is 1 37 49 48 4 23 2 
A mB 36 49 47 3 23 Zi 
E h 35 48 46 2 a 20 
68 74 72 34 47 45 19 


67 4 71 


418 


and mean, and believes they should be made 
to suffer for their sins. Hostility amounts to 
chronic hate and anger. 

Among the 20 “core” items on the Py scale 
are items like the following: “I believe that a 
person should never taste an alcoholic drink,” 
“Sexual things disgust me,” and “I deserve 
severe punishment for my sins,” suggesting 
preoccupation with ideas of sin and punish- 
ment; among the 30 items added by item 
analysis such items as “I am inclined to take 
things hard,” “It makes me nervous to have 
to wait,” and “Dirt frightens or disgusts me,” 
suggest general neurosis. 

Norms for the Ho, Pv, and Ta scales were 
derived on a sample of the same normal 
group that was used in deriving the norms for 
the original clinical scales of the MMPI. 
The sample consisted of 541 individuals, 226 
males and 315 females. These norms for 
males and females are presented in Tables 4, 
5, and 6. 


Summary 


The development of two keys for the Min- 
nesota Multiphasic Personality Inventory by 
selecting principally on the basis of content 
two sets of 50 items from 250 found to dis- 
criminate significantly between teachers scor- 
ing high and teachers scoring low on the Min- 
nesota Teacher Attitude Inventory is de- 
scribed and the items are listed. The Ho 


scale (Hostility) reveals a type of individual 


Walter W. Cook and Donald M. Medley 


characterized by a dislike for and distrust of 
others. The Pv scale (Pharisaic virtue) re- 
veals a type of person who describes himself 
as preoccupied with morality and ridden with 
fears and tensions. A Ta (Teacher atti- 
tude) scale made up of all 100 items is also 
Proposed. When administered to a rather 
homogeneous group of graduate students 10 
education classes, the internal consistency 
reliability coefficients of the two short scales 
were estimated to be .86 (for Ho) and .88 
(for Pv), and the Ho, Pv, and Ta scales cor- 
related — 44, — 46, and — .50, respectively, 
with the Minnesota Teacher Attitude In- 
ventory. 


Received November 3, 1953. 


References 


1. Cook, W. W., and Hoyt, C. J. Procedure for de- 
termining number and nature of norm groups 
for the Minnesota Teacher Attitude Inventory: 
Educ. psychol. Measmt., 1950, 12, alg 

- Cook, W. W, Leeds, C. H., and Callis, 
Minnesota Teacher Attitude Inventory. Ne 
York: The Psychological Corporation, se 

- Hathaway, S. H. and McKinley, J. C- ze 
Minnesota Multiphasic Personality Inventor: 
Minneapolis, Minnesota: The University Ls 
Minnesota Press, 1943, 

4. Hoyt, C. J. Test reliability obtained by analya 

of variance, Psychometrika, 1941, 6, 153-1 r- 

h Zubin, J. A transformation function for propag 
tions and percentages. J, appl. Psychol, 1 
19, 213-220, 


K” 


w 


sis 


n 


ke Pe a 


— 


THE JOURNAL or APPLIED Ps 
V derpaa or An D PsycHorocy 


The Relationship of Job Values and Desires to Vocational 
Aspirations of Adolescents 


Stanley L. Singer 


Valley Psychological Consultants, Van Nuys 


and Buford Steffire 


Counseling and Guidance Service, Los Angeles Board of Education 


Two relatively recent developments in 
counseling theory relate to the importance of 
understanding the individual’s “level of vo- 
cational aspiration” and “job values and de- 
Sires.” These two personality dimensions are 
assuming increasing importance in our at- 
tempts to explore the dynamics of vocational 
Selection and adjustment. 

Vocational counselors have long been aware 
of the importance of level of vocational as- 
Piration as a guidepost in making long range 
Plans because realism in level of aspiration 
does much to overcome pressures for the se- 
lection of vocational goals which might lead 
to much frustration. Because of the impor- 
tance of vocational aspiration level to mental 
health, research in this area is greatly needed, 
Particularly as an aid in disclosing the rela- 
tionship of aspiration level to other aspects 
of vocational selection. 

Another area where research is needed is 
that of job values and desires which are also 
of great importance in making vocational 
plans. By job values and desires are meant 
the answers given to the basic question, 

What do you really want from a job?” Job 
Values and desires refer not to the kind of 
work or duties performed, but to the source 
of satisfaction in the work and are defined in 
this study as the choices listed in the follow- 
ing Job Values and Desires Checklist. 


Centers’ Job Values and Desires Checklist 


If you had a choice of one of these kinds of 


jobs, which would you choose? (Put a number 
1” by your FIRST choice. Tf you have OTHER 
Choices which you would like to indicate, put a 
number “2” by your second choice and a number 
3” by your third.) 


——A. A job where you could be a leader. 
~B. A very interesting job. 


—cC. A job where you would be looked upon 
very highly by your fellow men. 

—D. A job where you could be boss. 

—E. A job which you were absolutely sure 
of keeping. 

—F. A job where you could express your 
feelings, ideas, talent, or skill. 

—G. A very highly paid job. 

—H. A job where you could make a name 
for yourself—or become famous. 

——TI. A job where you could help other peo- 


ple. 
—J. A job where you could work more or 


less on your own. 


Centers? has done extensive work on this 
problem with adults from different social 
classes as well as from rural and urban en- 
vironments. His major finding was that self- 
expression is a “middle class” job value while 
security is a “working class” value. 

The present study attempts to examine the 
job values and desires of adolescents in rela- 
tion to level of aspiration as measured by the 
Level of Interest section of the California 
Occupational Interest Inventory. The prob- 
lem being explored here is whether differences 
in level of aspiration are reflected in differ- 
ences in job values and desires. It is hoped 
that some understanding of the relationship 
between the concepts of level of interest and 
of job values may follow from such an ex- 
ploration. 

Some justification is needed for considering 
the Level of Interest section as a measure of 
vocational aspiration. The manual ° for the 
interest inventory gives no evidence indicat- 
ing that scores on the Level section are in 
any way associated with differences in as- 


1R. Centers. Psychology of social class. Prince- 
ton, New Jersey: Princeton University Press, 1949, 
219 pp. 

2E. A. Lee and L. A. Thorpe. Manual of direc- 
tions—Occupational Interest Inventory, Advanced 
Series. Hollywood: California Test Bureau, 1943. 


419 


420 


piration level. However, Steffire è found that 
when scores on the Level of Interest section 
were compared to an independent measure of 
vocational aspiration—the client’s vocational 
objective—subjects aspiring to the higher 
level occupations made significantly higher 
scores. Stefflre concluded, in speaking of the 
Level section, “This section of the test would 
appear to be a good rough index of the direc- 
tion and extent of the student’s aspiration as 
it will be expressed through the selection of a 
vocational objective.” This research on over 
1,000 high school seniors suggests that the 
Level of Interest section on the Lee-Thorpe 
Occupational Interest Inventory is an ade- 
quate measure of vocational aspiration. 

The present study compared the job values 
and desires of seventeen- and eighteen-year- 
old Caucasians who scored in the lower quar- 
ter on the Level of Interest section of the 
California Occupational Interest Inventory to 
similar groups scoring in the upper quarter 
on the same section. The null hypothesis is 
that differences in level of aspiration are un- 
related to the preference for job values and 
desires. The sample was composed of 212 
male high school seniors and 242 female high 
school seniors from the Los Angeles City 
Schools. Separate analyses were made for 
males, for females, and for a combined sample 
of both sexes. Chi square with the Yates 
correction was applied to examine the rela- 
tionships, 

All subjects had Participated in a special- 
ized vocational guidance program made avail- 
able to them during the 1952-53 school year. 
The guidance Program consisted of seven 
steps: (1) initial structuring meeting during 
which the entire counseling program was ex- 
plained; (2) basic testing which Measured 
mental capacity, interest, and temperament; 
(3) initial interview with a counselor to re- 
late test results and Personal-social back- 
ground to tentative vocational objectives; 
(4) study of Occupational information; (5) 
additional testing as needed; (6) final inter- 
view to plan objectives and training: and 
3 B. Stefflre. Psychological factors associated with 

aspiration for socio-economic mobility, 


5 7 0 e Unpublished 
dissertation, University of Southern California, June 
1953, 


Stanley L. Singer and Buford Steffre 


(7) invitation to the parents to discuss the 
student’s plans with the counselor. : 
During the basic testing period, the meas 
ure of interest used was the California 4 
pational Interest Inventory. This test is 
six fields of interest: Personal-Social, Natu 
ral, Mechanical, Business, Arts, and Saen 
three types of interest—Verbal, Marpa 
tive, and Computational—and a Level of In 
terest section which has been discussed — 
as a measure of vocational aspiration. 
present study was only concerned with t 
last section of the test, list 
Centers’ Job Values and Desires Checka 
was used as the index of the student’s ri 
value preferences. The card was presente 
and checked during the first interview. is 
Consistently the percentage of responden 5 
selecting category B—“Interesting expe 
ence”—and category F—“Self-expression F 
was far above the percentage selecting any e 
the other categories. This finding was appê 
ent for both the males and females as ra 
as for the combined group. Only the low 
quarter male group did not show a trent 
Preference for “self-expression.” ‘The tares 
categories selected least often were “powe!s 
“leadership,” and “esteem,” he 
Table 1 summarizes the results for 
males. Chi square was significant in two 


Table 1 


i 1 of 
Chi Square of Upper and Lower Quarters on Leve 
Interest Section and Job Values and 
Desires for Males 


Lower 
Quarter Quarter, 
(Wa 148) Ag 
Category % : 
A. Leadership 5 ; 
B. Interesting Experience 18 
C. Esteem 2 
D. Power 4 ; 
E. Security 12 1y 
F. Self-Expression 29 8 
G. Profit 11 ó 
H. Fame 4 9 
I. Social Service 7 19* 
J. Independence 8 


p Significant at 5 per cent level. 
ignificant at 1 Per cent level. 


——— 


Vocational Aspirations of Adolescents 


Table 2 


Chi Square of Upper and Lower Quarters on Level of 
Interest Section and Job Values and 
Desires of Females 


Upper Lower 

Quarter Quarter 

(N = 137) (N= 105) 
Category % % 
A. Leadership 4 2 
B. Interesting Experience 26 31 
C. Esteem 2 A 
D. Power 1 s 
E. Security 6 me 
F. Self-Expression 28 4 
G. Profit 4 E 
H. Fame A s 
I. Social Service 18 a 
J. Independence 7 


ti Note: No differences between upper and lower quar- 
ers were significant. 


the comparisons. On category F—“A job 
Where you could express your feelings, ideas, 
talent, or skill’—P was significant beyond 
the 1 per cent level of confidence. Selection 
of this value is positively related to high vo- 
cational aspiration level. On item J—“A job 
Where you could work more or less on your 
Own’—chi square was significant at the 5 
Per cent level of confidence. Here the males 
falling in the bottom quarter on vocational 
aspiration tended to select the value of job 
independence” more often than the group 1n 
the upper quarter. 

Table 2 summarizes the 
females, It is apparent from the results that 
Scores on the Level of Interest section had no 
Significant relationship to the selection of job 
Values and desires. 

_ Table 3 presents the results for the com- 
Med group of males and females. Two of 

e comparisons were statistically significant. 
Category A—“A job where you could be a 
€ader”—was preferred by more subjects than 
Would be expected who fell in the upper quat- 
ter on the aspiration measure. By the same 
token, those falling in the lower quarter 
tended to underselect this particular value. 

Leadership,” it will be recalled, showed no 
relationship to vocational aspiration score 
When considered for males and females sepa- 


findings for the 


421 


rately, and was one of the categories least 
selected by all groups. 

Category F—“A job where you could ex- 
press your feelings, ideas, talent, or skill”— 
was significantly overselected by the group 
scoring in the upper quarter on the Level 
section while this same job value was of little 
concern to those scoring in the bottom quar- 
ter on the aspiration measure. It will be re- 
called that “self-expression” was significantly 
related to score on Level of Interest for the 
males also, although not for the females. 

Summarizing the findings then, males who 
demonstrate high level of vocational aspira- 
tion are relatively more concerned with job 
values and desires that involve “self-expres- 
sion.” On the other hand, males who demon- 
strate low vocational aspiration are relatively 
more concerned with the job value of “inde- 
pendence.” For adolescent females there ap- 
pears to be no significant relationship be- 
tween aspiration level and job values. For 
the combined group of males and females, 
desires for “leadership” and “self-expression” 
are positively related to high vocational as- 
piration. 

The negative findings for females may mean 
that adolescent girls select job values from 
very personal motives unrelated to aspira- 
tions for social status. Since the eventual 


Table 3 


Chi Square of Upper and Lower Quarters on Level of 
Interest Section and Job Values and Desires 
for Combined Group 


Upper Lower 
Quarter Quarter 
(N = 285) (N= 169) 
Category lo % 
A. Leadership 5 1 
B. Interesting Experience 22 30 
C. Esteem 2 4 
D. Power 2 2 
E. Security 9 i2 
F. Self-Expression 28 15** 
G. Profit 8 5 
H. Fame 4 6 
I. Social Service 12 14 
J. Independence 8 11 


* Significant at 5 per cent level. 
** Significant at 1 per cent level. 


422 


socio-economic status of a girl is more likely 
to be determined by marriage than by her 
occupation, it is possible that her strivings 
for social status are not reflected in voca- 
tional values and desires. 

In a review of the findings, certain similari- 
ties to Centers’ results become apparent. It 
must be kept in mind in making this com- 
parison that the present study examined the 
relation of expressed job values to a Level of 
Interest scale whose connection with ultimate 
occupational status and socio-economic status 
has not been established, while Centers 
studied the relation of expressed job values 
to known socio-economic status (middle class 
or working class status). The present study 
found a preference for “self-expression” (in 
males and in combined sex sample) and for 
“leadership” (in combined sex sample) to be 
related to a high level of vocational aspira- 
tion; Centers found a preference for “self- 
expression” and to some extent, “leadership” 
to be related to membership in the middle 
class. The present study found preference 
for “independence” to be related to a low 
level of vocational aspiration; Centers noted 
a tendency for “leadership” preference to be 
related to membership in the working class 
The findings of the two studies, when ve 
pared in this manner, suggest the need to ex- 


Stanley L. Singer and Buford Steffire 


amine more closely the relationship between 
level of vocational interest in adolescents and 
socio-economic status. It is possible that the 
adolescent with a high level of vocational 
aspiration identifies himself with the middle 
class and hence views job values in the man- 
ner of adult middle class members while the 
adolescent with a low level of vocational as- 
piration may identify himself with the work- 
ing class. 


Summary 


This study has examined the relationship 
between: (1) level of aspiration, as measured 
by the Level of Interest section of the Occu- 
pational Interest Inventory; and (2) job 
values and desires, as measured by the check- 
list developed by Centers. For the male 
group it was demonstrated that a relation- 
ship exists between these two variables tor 
some job values and that this relationship 
seems to be in line with that noted by Centers 
when he examined the role of socio-economic 
differences in the selection of job values an 
desires. Such a finding gives some tentative 
and indirect support to the belief that scores 
on the Level of Interest section may indicate 
the socio-economic status with which the ado- 
lescent male identifies himself. 


Received December 3, 1953, 


_=S 


o Oe 
= a et 


Tue JOURNAL or APPLIED PSYCHOLOGY 
r NA Y 
Vol. 38, No. 6, 1954 


Permanence of Strong Vocational Interest Blank Scores * 


Kalmer E. 


Stordahl 


University of Minnesota ? 


In assisting young men and women to make 
appropriate vocational and educational choices 
counselors make extensive use of tested inter- 
ests. One of the problems with which both 
the counselor and counselee are concerned is 
the permanence or stability of the scores on 
the measuring instrument. 

The present study was designed to give an 
estimate of the permanence of the scores of 
pre-college males on Strong’s Vocational In- 
terest Blank. For an excellent review of the 
literature up to 1943 on the permanence of 
Strong scores see Strong (7). Two recent 
Studies of the permanence of scores over a 
number of years have also been published by 
Strong (5, 6). 


Method 


B the spring of 1949, the Vocational Interest 

lank was offered on an optional basis to all 

ee school seniors who participated in the state- 
Wide testing program in the state of Minnesota. 

ppProximately 3,500 senior boys completed the 
ank, 

A check was made of the University enroll- 
Ment in the spring of 1951 and it was found that 
pere Were 331 boys enrolled who had completed 

€ blank in 1949. To determine whether or not 

e interests of those boys who moved from a 
Predominantly rural environment to a metro- 
Politan one had changed more than the interests 
8 those boys who remained in a metropolitan 
environment, the 331 boys were divided into two 
groups, Those who graduated from high schools 
a Minneapolis, St. Paul, and their immediate 

uburbs were designated as a “metropolitan 

POOP (N = 250), The second group, all of 

hom had graduated from high schools in cities 
w less than 20,000, were designated as a “non- 

€tropolitan” group (N = 81). 
th random sample of 125 boys was chosen from 
€ metropolitan group. These boys plus the 81 

On-metropolitan group were contacted in the 
gong of 1951 and asked to complete the Strong 
ank; 182, 88 per cent, complied. One blank 


1 : . 
the. This paper is based upon a portion of a PhD. 


Sis submitted to the graduate faculty of the Uni- 
aed of Minnesota. The author wishes to A 
Dipa sdge the guidance of his advisor, Dr. Willis E. 


Ban, 
Now. at Arkansas Polytechnic College. TE 


was unusable so that the scores of 181 boys 
were used in the study. 

The minimum time between test and retest 
was two years and the maximum time did not 
exceed 2.5 years. The mean ages, at the time of 
the retest, of the metropolitan and non-metro- 
politan groups were 19.7 and 19.9 respectively. 
This difference was not statistically significant 
(P >05); 

The median high school percentile rank of the 
metropolitan boys was 74.1 and that of the non- 
metropolitan boys was 79.9. The mean ACE 
Psychological Examination score of the metro- 
politan group was 119.69 and the mean for the 
non-metropolitan group was 122.06. These differ- 
ences were not statistically significant (P > .05). 

The tests and retests for the 181 subjects were 
scored on 44 occupational keys and for Interest 
Maturity, Occupational Level, and Masculinity- 
Femininity. Also, using Darley’s criteria (1), 
judgments of patterns were made for the eleven 
occupational interest groups. All judgments were 
made independently by two persons and in those 
cases where disagreement was found a third per- 
son made a third independent judgment. When 
more than two judges were needed the pattern 
was designated as that on which two of the three 
judges were in agreement. Thus, each of the 
eleven interest groups for each subject was scored 
as being a primary, secondary, tertiary, or “no” 
pattern. The third judge was needed for ap- 
proximately five per cent of the judgments made. 


Results 


Permanence of Mean Scores. One way of 
measuring the permanence of scores is to de- 
termine the stability of means between ad- 
ministrations. This was done separately for 
the 111 metropolitan and 70 non-metro- 
politan boys. The means and variances of 
the standard scores for 44 occupational and 
3 non-occupational scales are given in 
Table 1.° 

The significance of the difference between 
the test and retest means for each key was 


8 Table 1 has been deposited with the American 
Documentation Institute. Order Document No. 4239 
from the ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washington 
25. D. iy remitting in advance $1.25 for 35 mm. 
microfilm or $1.25 for 6 X 8 in. photocopies. Make 
checks payable to Chief, Photoduplication Service, 


Library of Congress. 


424 


tested by means of the t test, taking the 
correlation into account. Since the F test 
showed that the variances were significantly 
different in some cases, the assumption of ho- 
mogeneity of variances did not hold in all 
cases; this is indicated in Table 1. 

There was a significant difference between 
the means of the two administrations at the 
.01 level on 24 scales for the non-metropoli- 
tan group and on 26 scales for the metropoli- 
tan group. Twenty-two of these scales were 
common to the two groups. The significant 
changes in means between the test and retest, 
as shown in Table 1, were in both the posi- 
tive and the negative direction. The direc- 
tion, in all instances where there was a sig- 
nificant difference between administrations at 
the .01 level, was the same for the metropoli- 
tan and non-metropolitan groups. 

The interest group which showed the larg- 
est and most consistent changes was Group V. 
All the scales within this group showed a sig- 
nificant increase in mean score for both the 
metropolitan and non-metropolitan boys. Of 
the non-occupational scales, Interest Maturity 
was the only one which changed significantly, 
As would be expected, the mean on this scale 
increased for both the metropolitan and non- 
metropolitan boys. 

To determine whether or not there was a 
difference between the metropolitan and non- 
metropolitan groups with respect to stability 
of mean scores the means of the difference 
Scores (test minus retest) were compared for 
each scale. When the variances were found 
to be homogeneous by means of the F test, 
the t test was used. When the variances 
were not homogeneous, the approximate 
method proposed by Cochran and Cox to 
test the hypothesis of equality of means with 
no hypothesis about the Population variance 
was used (2). None of the differences be- 
tween the means of the difference scores of 
the two groups were significant at the 01 
level; three (Aviator, Vocational Agriculture 
Teacher, and Sales Manager) were found to 
be significantly different at the -05 level. 

Test-Retest Correlation. The test-retest 
scores for each of the scales were plotted and 
in all cases the relationship between test and 


retest appeared, by inspection, to be linear. 


Kalmer E. Stordahl 


Table 2 


Correlations Between Two Administrations of the Voca- 
tional Interest Blank for 111 Metropolitan 


and 70 Non-Metropolitan Males 


Non- 
Metro- wae 
volitan poli 

ea Scale Males Males 
I Artist -12 i11 
Psychologist 77 67 
Architect 75 68 
Physician 15 -64 
Osteopath 71 63 
Dentist 71 -60 
Veterinarian 74 61 
II Mathematician 59 66 
Physicist B (69 
Engineer -18 19 
Chemist 79 ‘a 
III Production Manager -62 16 
IV Farmer 85 76 
Aviator 78 81 
Carpenter .80 a 
Printer 61 60 
Math. Phys. Science Tchr. 72 61 
Industrial Arts Tchr. AD 4 
Voc. Agri. Tchr, 81 -68 
Policeman 69 60 
Forest Service Man 80 66 
V YMCA Physical Director 72 -66 
Personnel Director .60 49 
Public Administrator 62 45 
YMCA Secretary .62 -60 
Soc. Science H, S. Tchr, 689 
City School Supt. 70 69 
Minister -67 -70 
VI Musician 68 76 
VII CPA 63 60 
VIII Senior CPA 65 62 
Accountant ae -66 
Office Man 66 4r 
Purchasing Agent 12 17 
Banker 45 64 
Mortician mao M 
Pharmacist .54 59 
IX — Sales Manager 77 62 
Real Estate Salesman 12 e 
Life Insurance Salesman at. 0 
X Advertising Man AS f 
Lawyer 43 a 
Author-Journalist -10 a 
XI President Mfg. Concern 67 = 
Interest Maturity 66 D 
Occupational Level 70 37 

Masculinity-Feminity .76 


* Difference Significant at the .05 level. 


Permanence of Strong Vocational Interest Blank Scores 425 


Product moment correlation coefficients were 
then computed between the test and retest 
for each of the scales. These were computed 
separately for the metropolitan and non- 
metropolitan group. The correlations are 
given in Table 2. ` 

None of the observed differences between 
the test-retest correlations for the metropoli- 
tan and non-metropolitan groups were found 
to be significant at the .01 level. Only one, 
that for Forest Service Man, was found to 
be significant at the .05 level. 

Permanence of Letter Grade Scores. To 
get a measure of permanence in terms of let- 
ter grade scores, the change in letter grades 
between test and retest was determined. For 
each Strong scale a tabulation was made of 
the letter grade obtained on the retest for 
each letter grade received on the test. Since 
the comparisons of mean scores and of cor- 
relations for the metropolitan and non-metro- 
politan groups indicated that the two groups 
were similar with respect to permanence of 
scores, the metropolitan-nonmetropolitan clas- 
sification was not retained for this part of 
the study. The two groups were pooled and 
treated as a single sample. 

Table 3 gives the amount of change in let- 
ter grades between test and retest when all 
occupational scales are summed. Because of 
space limitations a breakdown by individual 
keys is not included here. For such a de- 
tailed breakdown see Stordahl (4). 

A chi-square test for independence of letter 
grade and permanence was made by summing 
the letter grades over all scales and classify- 
ing the change in letter grades between test 
and retest into two categories—‘“identical” 
and “not identical.” The hypothesis of inde- 
pendence of permanence and letter grade was 
rejected (P < 001). Table 3 indicates that 
on the average, C grades were the most stable, 
68 per cent of the C grades on the first test 
being C grades on the second test. The sec- 
ond most stable letter grade was A, with 60 
Per cent of the letter grades being identical 
On the test and retest. The intermediate let- 
ter grades were less stable. By combining 
the letter grades so that B included B +, B, 
and B —, and C included C + and C, it was 
found that 73 per cent of the C grades on the 


Table 3 


Change in Letter Grade Scores on the Vocational 
Interest Blank for 181 Boys Tested as High 
School Seniors and Retested Two Years 
Later as College Students 


Test Retest 

Letter hah % % 

Grade N C C+ B— B B+ A Total 
A 804 2 3 6 10 19 60 100 
B+ 761 4 6 13 21 26 30 100 
B 1,106 8 12 20 23 20 17 100 
B— 1,394 19 16 24 20 12 9 100 
C+ 1,300 31 23 22 15 6 3 100 
€ 2,599 68 15 9% & 2 1t 100 


test remained C grades on the retest and that 
59 per cent of the A and 59 per cent of the 
B grades remained constant. 

Permanence of Interest Patterns. The 
permanence of interest patterns over the two 
year period is summarized in Table 4. Here, 
as for the letter grades, the data are pre- 
sented for the metropolitan and non-metro- 
politan groups combined and for all interest 
groups combined. 

Chi-square was used to test the independ- 
ence of interest pattern and permanence by 
summing the patterns over all groups and 
classifying the change between tests as 
“identical” and “not identical.” The hy- 
pothesis of independence was rejected (P < 
.001). 

As can be seen from Table 4, the primary 
and “no pattern” patterns were found to be 
the most stable with 58 per cent of the pri- 


Table 4 


Change in Interest Patterns on the Vocational Interest 
Blank for 181 Boys Tested as High School 
Seniors and Retested Two Years 
Later as College Students 


Test Retest 
Pattern N %N HT MS HP Total 
P 229 B 12 17 58 100 
S 210 28 20 28 24 100 
T 274 45 19 16 20 100 
N 1,278 81 9 6 4 100 


426 


mary patterns on the first test being pri- 
mary patterns on the retest and 81 per cent 
of the “no patterns” being identical on the 
retest. The secondary and tertiary patterns 
were less stable. 

Permanence of Individual Profiles. The 
stability of individual profiles was deter- 
mined for each of the 181 boys. Kendall’s 
(3) coefficient of concordance, W, was used 
as a measure of stability. The coefficient, W, 
is based on the method of ranks. It is re- 
lated to Spearman’s rho. The 44 occupa- 
tional scales were used in computing this co- 
efficient; the non-occupational scales were not 
included. When there were ties in rank, the 
median rank was assigned to each of the tied 
scales. 

The median coefficient of concordance for 
the metropolitan group was .87 and the 
median for the non-metropolitan group was 
86. Since W has a direct relationship to 
Spearman’s rho, these figures can also be ex- 
pressed in terms of rho; the median rhos be- 
ing .74 and .72. All but nine of the coeffi- 
cients for the metropolitan group and six for 
the non-metropolitan group were found to be 
significantly greater than zero. 

. The homogeneity of the frequency distribu- 
tions of coefficients of concordance for the 
metropolitan and non-metro 
was tested by the Brandt-Sne. 
method (2), 
found to be ho 


politan groups 
decor chi-square 
The two distributions were 
mogeneous (P > .05), 


Discussion 


In this study a substa 
found to exist between 
ceived as high school 
sophomores. 


ntial relationship was 
the interest scores re- 
Seniors and as colle 
This relationship was, ie 
far from being a perfect one and large idi- 
vidual differences in stability were evident 
The scores of the metropolitan and Tok 
metropolitan boys were quite homogeneous 
with respect to permanence of interest scores 
The writer hypothesized that if the boys’ n- 
terests had not as yet become stabilized that 
some difference between these groups might 
be found. Assuming that interests are largely 
determined by one’s experiences, it was 
thought that the change in environment for 


Kalmer E. Stordahl 


the non-metropolitan boys might cause a 
greater change in their scores than would be 
found for the metropolitan boys, whose en- 
vironment remained relatively constant. Such 
a difference was not found. s 

No attempt has been made to make a di- 
rect comparison between the results of this 
study and the previous research of Strong 
and others. Such a comparison would be 
difficult since most previous research has been 
based on the original form of the blank 
whereas the revised form was used in the 
present study. However, since Strong (5, 6 
7) has reported some data on permanence 
with the revised keys and since the original 
keys were very similar to the revised, some 
general comparisons can be made. 

The results of the present study, with re- 
spect to permanence as measured by mean 
Scores, correlation, and permanence of indi- 
vidual profiles, are not greatly divergent 
from previous investigations, The results 
also support Strong’s conclusion that we can 
Place the greatest confidence in C letter 
grade ratings. Theoretically, as Strong has 
indicated, the A and C ratings should be the 
most stable as they cover a wider range of 
Scores than the B rating. The fact that the 
A ratings were found to be no more stable 
than the B can probably be accounted for bY 
the relatively small number of A ratings and 
the tendency for them to be low A ratings: 

Although no previous studies have consid- 
ered permanence in terms of interest patterns, 
counselors may find this the best way to look 
at permanence of scores, The evidence indi- 
cates that the counselor can place the most 
confidence in primary patterns and “no pat- 
terns” since these are apparently more stable 
than secondary and tertiary patterns. This i$ 


especially true when no pattern exists in a” 
Interest group. 


Summary 


A sample of 181 males, 111 from a largë 
Metropolitan area and 70 from non-metro- 
Politan areas, who had completed Strong's 
Vocational Interest Blank as high school 
Seniors were retested two years later as col- 
lege students. The tests and retests for the 


i 


Permanence of Strong Vocational Interest Blank Scores 


181 boys were scored on 44 occupational 
keys and for Interest Maturity, Occupational 
Level, and Masculinity-Femininity. The test- 
retest scores on the 47 scales were compared 
in several ways to secure an estimate of the 
stability of scores over the two-year period. 
The following measures of test-retest sta- 
bility were used: permanence of mean scores, 
test-retest correlation, permanence of letter 
grade scores, permanence of interest patterns, 
and stability of individual profiles. 

A substantial relationship was found to 
exist between the interest scores received as 
high school seniors and as college sophomores. 
The metropolitan and non-metropolitan boys 
were quite homogeneous with respect to 
permanence of Strong Scores. 


Received December 30, 1953. 


7 


7. Strong, E. K., Jr. 


427 


References 


. Darley, J. G. Clinical aspects and interpretations 
of the Sirong vocational interest blank. New 
York: Psychological Corp., 1941. 

. Johnson, P. O. Statistical methods in research. 
New York: Prentice-Hall, 1949. 

. Kendall, M. G. The advanced theory of statis- 
tics, Vol. I. (4th ed.) London: Charles 
Griffin & Co., 1948. 

. Stordahl, K. E. The stability of Strong Voca- 
tional Interest Blank patterns for pre-college 
males. Unpublished doctor’s dissertation, Uni- 
versity of Minnesota Library, 1953. 

. Strong, E. K., Jr. Nineteen year follow-up of 
engineer interests. J. appl. Psychol., 1952, 36, 
65-74. 

. Strong, E. K., Jr. 
over 22 years. 
89-91. 


Permanence of interest scores 
J. appl. Psychol., 1951, 35, 


Vocational interests of men 
and women. Stanford: Stanford Univer. Press, 


1943. 


HE JOURNAL OF APPLIED PSYCHOLOCY 
va 38, No. 6, 1954 


Adolescent Vocational Interests and Later Occupation 


Phyllis Rosenberg Levine 


Jewish Vocational Ser 


vice, Cleveland, O. 


and 


Richard Wallen? 


Western Reserv 


Although there is evidence that scores on 
the Kuder Preference Record differentiate 
among occupational groups (e.g., 2, 3, 5), 
few investigators have attempted to deter- 
mine the relationship between such scores 
and the occupation entered at a later date. 
Barnette (1) presents data indicating that 
Kuder scores are related to occupational satis- 
faction several years after counseling, but his 
subjects were young adults at the time of ad- 
visement. Since many counselors deal with 
adolescent youth, it is of some interest to find 
out whether Kuder scores obtained during the 
late adolescent period are related to actual 
occupations entered subsequently. 

The present paper reports a follow- 
of boys counseled at the Clevelan 
Vocational Service during the ye 
1944, and 1945. This study attempted to 
determine whether a relationship existed be- 
tween Kuder Preference Record (Form B) 
scores at the time of counseling and the oc- 


cupation engaged in at the time of the study 
(1952), 


up study 
d Jewish 
ars 1943, 


Subjects 


Questionnaires designed to elicit i 
about current Occupation were sent t 
who had been counseled and teste 
ing the years 1943-1945, Th 
cluded all eleventh and twelfth-grade males tested 
during that period and all tenth-grade males 
tested in 1943 and 1944, The original letter, a 
reminder postcard, and a follow-up letter yielded 
questionnaire returns from 58 per cent of the 
mailing list. Of the total group, six per cent 
were known to be in military service, seven per 
cent had moved to unknown addresses, and one 
per cent was deceased. No information of any 
kind was returned for 28 per cent of the group. 


nformation 
0 215 men 
d at JVS dur- 
e mailing list in- 


1 This paper is based on 
mitted in partial fulfillmen 
the M.A. degree at Weste 
the first-named author an 
author. 


a portion of a thesis sub- 
t of the requirements for 
rn Reserve University by 
d supervised by the second 


428 


e University 


In order to determine the existence of bias al 
the sample that returned questionaires, soi 
comparisons were made between the responden 
and non-respondent groups. The mean age 0 
the 124 respondents at time of original testion 
was 16 years and 7 months and that of the 6 
non-respondents was 16 years 10 months, 
months higher. The difference is statistically 1n- 
significant (£ =.39). Scores on the American 
Council Examination (High School Form, 1942) 
were available for 44 of the respondents and for 
23 of the non-respondents. The means of a 
two groups were not significantly different for j 

ot Total scores. (For Total scores, £= -88 
When mean scores on the separate scales of the 
Kuder were compared, only one difference ye 
significant at the five per cent level. That dif- 
ference occurred between mean scores on the 
musical scale, and the respondents were sign” 
cantly lower in measured musical interest. Thee 
comparisons suggest that the respondent group ‘a 
a fairly unbiased sample of the total group t 
whom questionnaires were mailed. 

Further data on the respondent group were ob- 
tained from the questionnaires, In terms of edi 
cational achievement, it is clear that our respons 
ents are not representative of the general MAg 
population and are probably not representativ 
of males seen at the agency. Of the respondents 
79 per cent indicated that they had completed S 
least four years of college. Less than two P? 
cent had not finished high school. This substa? 
tial educational attainment is probably due ee 
Part to the financial assistance given, veterani 
during the period covered by the study, but 1 
may also reflect the family educational aspit 
tions held by the clients of this agency. 


Procedure 


, The questionnaire consisted of three items ask 
ing for identifying data, one item on amoun a 
education, two items requesting data about pre?" 
ent occupation and length of time it had been a 
gaged in, four items dealing with job satisfaction; 
and one item about the estimated influence pe 
JVS counseling on Occupational choice. T 


f . the 
data on job satisfaction are not analyzed in n 
present report, since 


A e 
deia D S few respondents indicat 
‘satisfaction with their current occupation- 


C ee eee -= 


Adolescent Vocational Interests and Later Occupation 


The first step in the treatment of the data was 
to classify the reported occupations of the re- 
spondents. Using the Kuder manual (3) as a 
guide, the occupations were coded as belonging 
to one or more of the nine Kuder interest areas. 
For example, mechanical engineering was classi- 
fied as belonging to the mechanical, computa- 
tional, and scientific interest groups. In most 
cases the reported occupation was classified ac- 
cording to its listing in the Kuder manual. A 
subjective judgment had to be made in a small 
number of cases: T-V producer was classed as 
persuasive, artistic, literary, and musical; gradu- 
ate student in international relations was classed 
as persuasive, literary, and social service; busi- 
ness men, executives, and those who were self- 
employed were included in the persuasive inter- 
est group. The occupational interest classifica- 
tion was entered on three by five cards con- 
taining other data about the subjects, so that 
tabulation could be done directly from the cards. 

Seven cases were eliminated from the respond- 
ent group at this point. They were either un- 
employed or were in undergraduate college. The 
final group of respondents used in this study, 
then, numbered 117. 

For each Kuder scale, the total group of re- 
spondents was divided into two sub-groups: those 
in occupations belonging to that interest area 
and all others. Mean Kuder raw scores and 
standard deviations were computed for each of 
these sub-groups, and a t-test was applied ‘to the 
differences between the means. 


Results 


Table 1 summarizes the results of the sta- 
tistical analysis. Taking the mechanical in- 
terest scale as an example, Table 1 reads as 
follows: Of the entire respondent group, 26 
were currently occupied in jobs involving me- 
chanical interest, and 91 were in jobs that did 
hot require this interest. The mean mechani- 
cal interest score earned by the mechanically 
Occupied group, seven to nine years earlier, 
was 81.6, The mean score of those now i 
other kinds of jobs was 69.5. The difference 
between the mean scores of the two groups Is 
Significant at the five per cent level of confi- 
dence as shown by the ¢ value of 2.46. 

Only mean scores are presented for the ar- 
tistic and musical scales, since few subjects 
reported occupations involving these interests. 
On both of these scales, however, the differ- 
ences are in a direction consistent with those 
found for the other scales. 

For six of the remaining seven scales, the 
data show that men currently engaged in oc- 


429 


Table 1 


Comparisons of Kuder Preference Record Scores Made 
Seven to Nine Years Ago by Men Engaged in 
Occupations Related to an Interest Area 
and by Those Engaged in Other 


Occupations 
Kuder Scale N M S.D. t 
Mechanical 
Occupied 26 81.6 21.5 2.46* 
Others 91 69.5 22.8 
Computational 
Occupied 34 4s S ges 
Others 83 35.5 9.9 
Scientific 
Occupied 30 86.1 11.1 5.39** 
Others 87 71.1 175 
Persuasive 
Occupied 53 78.8 17.2 4,48** 
Others 64 64.5 17.0 
Artistic 
Occupied 5 47.6 = 
Others 112 40.2 — 
Literary 
Occupied 13 65.5 13.6 3.02** 
Others 104 52.7 14.9 
Musical 
Occupied 2 31.0 _ 
Others 115 19.0 — 
Social Service 
Occupied 10 75.0 18.3 1.59 
Others 107 64.9 17.5 
Clerical 
Occupied 24 59.0 13.2 3.93** 
Others 93 49.7 15.4 


* Significant at the 5 per cent level of confidence. 
** Significant at the 1 per cent level of confidence. 


cupations involving those interests made sig- 
nificantly higher mean scores than did men 
in other occupations. 

The failure of the men engaged in social 
service occupations to show a significantly 
higher social service score than those in other 
occupations probably reflects an inadequacy 
of our sample rather than an inadequacy of 
the scale. Eight of the ten cases classed as 
engaged in a social service occupation were 
students in professional school. Only one of 
these was in a graduate school of social work. 


430 


Three others were engaged in the graduate 
study of liberal arts subjects, three were in 
medical school, and one was in dental school. 
The composition of this sub-group, then, does 
not provide a satisfactory sample of persons 
in this occupational interest area. 

These results provide evidence that inter- 
est scores on the Kuder Preference Record 
are positively related to occupations entered 
seven to nine years later. Further, they indi- 
cate that interests have been sufficiently or- 
ganized by the time the last few years of high 
school are reached to provide one basis for 
estimating future occupational activity. 


Discussion 


Several considerations should be kept in 
mind in interpreting the results of this study. 
In the first place the interest test was ad- 
ministered as part of a total counseling sery- 
ice. Aptitude and achievement tests were 
used along with interest tests to provide a 
basis for personal interviews. It could be 
argued that the decisions arrived at during 
the entire counseling process largely deter- 
mined the occupation entered seven to nine 
years later. If the Kuder scores influenced 
counseling decisions, then the relationship 
found in this study could be due, not to the 
persistence of adolescent interests, but to the 
persisting effects of counseling based on ado- 
lescent interests, While our data cannot set- 
tle this issue definitively, several facts argue 
against the belief that counseling decisions 
alone can account for the relationship þe- 

interests and occupational 
entry. For one thing, Strong (4) has pre- 
sented findings that show Persistence of inter- 
ests over a long period of time. His original 
test data were apparently not collected ina 
counseling situation, so that the persistence 
of interests he found could not be attributed 
to the counseling process. For another thing 
our respondents themselves did not attribute 
a great deal of influence to the decisions ar- 
rived at in the counseling process, When 
asked whether the Suggestions made by the 
JVS influenced their occupational plans, onl 
35 per cent said “yes,” 44 per cent said tng? 
and 20 per cent could not recall any influ- 


Phyllis Rosenberg Levine and Richard Wallen 


ence. Although the recall of counseling in- 
fluence is not the sole valid measure of the 
existence of influence, our data certainly do 
not support the view that it determines occu- 
pational entry to a greater extent than the 
persistence of interests. 

A further consideration in interpreting our 
findings concerns the effect of military serv- 
ice and government assistance to veterans in 


school. Perhaps military service had the ef- : 


fect of disrupting the normal peacetime paths 
to occupational entry. Our results would 
then show the minimal relationship between 
adolescent interest and later occupation. On 
the other hand, our respondents may have 
been enabled to enter preferred occupations 
to a greater extent than is usually true, be- 
cause the veterans’ benefits helped them to 
continue their education. The most that can 
be said on this point is that our findings need 
to be supported by data collected during 4 
period free from the special influences Cre- 


ated by wartime mobilization and a postwat 
economy. 


Summary and Conclusions 


In order to discover whether a significant 
relationship existed between adolescent inter 
ests and later occupational choice, a question- 
naire was mailed to 215 men who had bee? 
counseled seven to nine years earlier during 
the latter portion of their high school careers: 
Usable information on current occupation was 
obtained from 117 of those on the mailing 
list. Comparisons of the respondents wit 
the non-respondents indicated no difference 
with respect to age at time of counseling; 1” 
telligence, and mean scores on eight of thé 
nine Kuder Preference Record scales. Re 
Ported occupations were classified in accor” 
ance with the interests they involved as Pt 
sented in the Kuder manual. ` 

For six of the Kuder interest areas; me 
currently engaged in a related occupat’? 
made significantly higher scores seven to ™? 
years ago than did men engaged in unrelat? 
occupations. The three remaining interes 
areas (artistic, musical, and social servic? 
did not yield clear-cut results because of 
inadequacies of the sample. 


aes 


Adolescent Vocational Interests and Later Occupation 431 


We conclude that interests measured by 
the Kuder Preference Record in adolescence 
are positively related to occupation engaged 
in seven to nine years later. 


Received November 13, 1953. 


References 


1. Barnette, W. L., Jr. Occupational aptitude pat- 
terns of selected groups of counseled veterans. 
Psychol. Monogr., 1951, 65, No. 5 (Whole No. 
322). 


2 


4 


w 


. Hahn, M. E. and Wiliams, Cornelia T. The 
measured interests of Marine Corps Women 
Reservists. J. appl. Psychol., 1945, 29, 198- 
211. 

. Kuder, G. F. Revised manual for the Kuder 
Preference Record, Vocational, Form B. Chi- 
cago: Science Research Associates, 1946. 

. Strong, E. K., Jr. Permanence of interest scores 
over 22 years. J. appl. Psychol., 1951, 35, 
89-91. 

. Triggs, Frances O. The measured interests of 
nurses: A second report. J. educ. Res., 1948, 
42, 113-121. 


THE JOURNAL OF APPLIED PSYCHOLOGY 
Vol. 38, No. 6, 1954 


The Degree to Which Colors (Hues) Are Associated with 
Mood-Tones * 


Lois B. Wexner 
Division of Education and Applied Psychology, Purdue University 


The literature is replete with statements 
concerning the relation of color and emotional 
states or feeling-tones, but there is a dearth of 
experimental investigation to support these 
statements. In a recent study by Odbert, 
Karwoski, and Eckerson (8), regarding the 
associations of color and mood, it was found 
that some colors were more often chosen to 
go with certain groups of words describing 
mood, such as red with exciting, orange with 
gay, yellow with playful, green with leisurely, 
blue with tender, purple with solemn, and 
black with sad. Two shortcomings of this 
study, however, are first, that the groups of 
words (represented above by exciting, gay, 
etc.) included words which could in no way 
be considered to mean the same thing, such as 
the “playful” list, which included humorous, 
whimsical, fanciful, quaint, sprightly, delicate, 
light, and graceful. Thus one subject may be 
reacting to one particular word in the list, and 
another, to an entirely different one. And, 
second, a partially “forced” method was used 
to fit the moods to a color-circle (arranged 
according to wave-length). For instance, gay 
is reported to “go with” orange, but in reality 
orange was chosen only 16 times, whereas 
red was chosen 62 and yellow 27 times. Fur- 
ther, the authors’ judgment appears to be the 
only method used to choose which colors went 
with which moods. Thus, although the nu- 
merical results are published, a clear-cut sta- 
tistical interpretation is lacking. Other studies 
(1, 2, 3, 9, 10, 13) report the association of 
color and moods, as determined by various 
methods including objective impressions, clini- 
cal observation, and introspection. f 


Purpose 


The purpose of this investigation is to de- 
termine to what degree colors (hues) are as- 
sociated with mood-tones. The hypothesis to 
be tested is that there is a positive relation 
between certain colors and mood-tones. 


1Grateful acknowledgment is made to Jame 
Norton, Jrg for his helpful suggestions in the = a 
statistical techniques, and to Joyce Block, Malcolm 


Robertson, and Henry Wexner, who served as judges, 


432 


Procedure 


The mood-tones used in this experiment were 
arbitrarily selected as a fairly representative 
group. Originally, twelve words were chosen, 
i.e exciting, secure, distressed, tender, protec- 
tive, despondent, calm, dignified, cheerful, defiant, 
powerful, and sensuous. Then a list of 164 
adjectives was prepared, including moods re- 
ported in the literature, synonyms of those words 
as well as those listed above, and other words the 
writer believed might be useful. The origina 
twelve words were presented to four judges, with 
the list of adjectives. The judges (two of whom 
were male and two female) were requested to 
choose words from the list of adjectives which 
they felt meant the same as the “mood-tone 
words. They were allowed to use words more 
than once if they wished. Then, the mood-toné 
words were listed together with their synonyms 
as unanimously chosen by the four judges. Since 
the judges did not agree on the meaning © 
sensuous, this word was not included in the ¢% 
periment. The final groups of mood-tones are 25 
follows: exciting, stimulating; secure, comfort- 
able; distressed, disturbed, upset; tender, sooth- 
ing; protective, defending; despondent, dejecte¢ 
unhappy, melancholy; calm, peaceful, serene; 
dignified, stately; cheerful, jovial, joyful; defiant, 
contrary, hostile; and powerful, strong, masterful. 

The subjects consisted of 94 students in, ê 
course of beginning General Psychology, of whic 
48 were female and 46 were male. The subjects: 
in three groups, were presented with an instruc: 


tion sheet containing the word roups as above 
and the following directions: ii 


The following groups of words are meant t 
represent feelings, or mood-tones, It is thought 
that certain colors tend to “go with” varion 
mood-tones, and this is an attempt to deter” 
mine to what extent this may be true. Pleas? 
select the one color, of the colors on the charts, 
that you feel best represents the feelings e 
scribed by the following word groups. 1l th 
colors need not be used, and colors may 
used more than once. Be sure to select 
color for each group, even though it may 56%% 
difficult to find a color to fit the mood- ome 
Usually your first impression would be 
best one, if in doubt. 

Eight colors, yello 
blue, black, and gr 
inch pieces of a 
inch pieces of | 
domly arranged 
should be noted 
Was made b 
ord 


mn, 
w, Orange, red, purple, brows 


een, in the form of 8) % 40 
rt paper mounted on 30% f 
ight-gray cardboard, were ra it 
at the front of the room. g 
that no mention of color 22,5 
y the experimenter. This WaS es 
er to avoid associations to color stereotyP 


La 


Colors and Mood-Tones 433 


and to assure that the colors as chosen were the Secure, comfortable. 
particular shades presented to the subjects, in an 
attempt to insure uniformity of shade. It might Group pelos Frequency 
further be noted that there was no difficulty in A Blue 41 
determining which colors the subjects intended B Brown 23 
| to indicate. Green 18 
Chi-square tests for any possible sex differ- c Yellow 8 
ences in color association to mood-tones were 
made. No significant differences were found to D ise : 
exist, ack 
; Since there were no significant sex differences, Red 0 
the frequencies from the two sexes were com- Purple 0 
bined into one set for further study. Then, for 
each mood-tone, a chi-square test was made to Distressed, disturbed, upsel. 
test whether or not the colors differed signifi- 
, cantly in frequency of association with that Group Color Frequency 
mood-tone. These chi-squares were significant in A Orange 34 
all cases. (A five per cent significance level was B Black 16 
used.) Thus it is demonstrated that some colors BS 
are more often associated with a given mood- c Purple 10 
tone than others. Brown 9 
_ The next step was to determine which par- Green 8 
ticular colors were most often associated with a Red 7 
given mood-tone. For this purpose, Tukey’s Yellow 5 
) (14) procedure for accomplishing multiple com- Blue 5 
parisons among a set of observed means was 
adapted to make multiple comparisons among a io soii 
set of observed frequencies in mutually exclusive g & 
categories (7). The essential nature of this Group Color Frequency 
adaptation was to use the inverse sine transfor- P Blue AL 
mation upon the observed proportions. The B G 24 
|  &Iror variances of such transformed proportions peor mi 
are given by the theory of the transformation c Yellow 11 
In all cases a significance level of five per Purple 9 
cent was used. Brown 6 
D Orange 2 
Results Black i 
A Red 0 
; The following results were obtained. For > 
€ach mood-tone, the colors are grouped (A, Protective, defending. 
B, C, etc.) according to the results of the si CRAE 
multiple comparisons tests. The interpreta- Cong n ai g 
tion of these groups is as follows: colors in A eN Fe 
the same group are associated with the mood- a is 
4 tone significantly more often than colors in ire 15 
8roups below them, and significantly less often Burge 14 
than colors in groups above them. Colors in B Green 5 
the same group do not differ significantly Orange 4 
ftom each other in frequency of association Yellow 3 
With the mood-tone. 
Despondent, dejected, unhappy, melancholy. 
Exciting, stimulating. Shas Color Frequency 
Group Color Frequency A Black 25 
A Red 61 Brown 25 
B Yellow 12 B Purple 11 
Orange 11 Blue 11 
G Green : Green 9 
Purple Yellow 5 
Black 2 Orange 4 
2 
Blue ð c Red 0 


Brown 


434 


Group 


Group 
A 


Group 
A 
B 
Cc 


Calm, peaceful, serene. 


Color 
Blue 
Green 
Yellow 
Purple 
Orange 
Brown 
Black 
Red 


Dignified, stately. 


Color 
Purple 
Black 
Blue 
Brown 
Red 
Orange 
Yellow 
Green 


Cheerful, jovial, joyful. 


Color 
Yellow 
Red 
Orange 
Green 
Blue 
Purple 
Brown 
Black 


Defiant, contrary, hostile. 


Color 
Red 
Orange 
Black 
Brown 
Purple 
Yellow 
Green 
Blue 


Powerful, strong, masterful, 


Color 
Black 
Red 
Purple 
Blue 
Brown 
Orange 
Yellow 
Green 


Lois B. Wexner 


Frequency 
38 
31 


Oww No 


Frequency 
45 
30 
9 


oomwa 


Frequency 


Frequency 
23 
21 
18 
11 
9 


5 
5 
2 


Frequency 


48 
23 


mewaa 


Discussion 


In general, the results of this investigation 
tend to support the color-mood studies as re- 
ported in the literature. It should be noted, 
however, that the association of some mood- 
tones with certain colors is more clear-cut 
than others. For instance, in some cases one 
color “goes with? a mood-tone significantly 
more often than does any other color (of the 
particular shades of colors used in this ex- 
periment). Red is more often associated 
with exciting-stimulating, blue with secure- 
comfortable, orange with distressed-disturb2d- 
upset, blue with tender-soothing, purple with 
dignified-stately, yellow with cheerful-jovial- 
joyful, and black with powerful-strong-mas- 
terful. On the other hand, there is 1° 
statistically significant difference betwee" 
certain colors in their association with cet 
tain other mood-tones, such as red, brow! 
blue, black, and purple with protective-de- 
fending; black and brown with despondent- 
dejected-unhappy-melancholy; blue and gree? 
with calm-peaceful-serene; and red, orange 
and black with defiant-contrary-hostile. 

Since there appears to be fairly consistent 
agreement among the studies on this subject 
it is appropriate. to suggest possible CO? 
tributing factors, although it is not the PY 
pose of this paper to investigate this Pa 
ticular aspect of the problem. In additio” 
to the cultural factor which no doubt plays 
an important part in the associations 
colors with certain mood-tones, there seems 
to be the possibility of the existence of biO” 
logical determinants. Guilford (6) states 
that experimental results “point very strong y 
to a basic communality of color preference? 
among individuals. This communality Pf; i 
ably rests upon biological factors, since Ít 
hard to see how cultural factors could Pea 
duce by conditioning the continuity and ye 
tem that undoubtedly exists.” Goldstein ( 
1S more explicit in setting forth physiolos! d 
effects of color on the human organism, n 
indicates that patients, exposed to Va" eri 
colors such as large sheets of colored PaP pt 
change the position of the arms in diffe “cb 
directions, according to the color to the 
they are exposed; that color influences ec? 
speed of volitional movements; and that $d 
and felt distances and time intervals 


jous 


x 


pen A S Se 


Colors and Mood-Tones 


weights are judged differently under the in- 
fluence of different colors. He finds, further- 
more, that green favors performance in gen- 
eral, in contrast to red, and feels that these 
different effects correspond to very definite, 
but different, total behavioral attitudes, which 
find their expression very clearly in the sub- 
ject’s reports of the mood corresponding to 
the various colors. 

_ In addition to the part played by learning 
in the cultural and biological determinants of 
associations of colors with certain mood- 
tones, there may be an additional factor 
which should be included, in the form of par- 
ticular learning situations, which may affect 
individuals and/or groups. An experiment in 
support of this type of contribution was done 
by Staples and Walton (12). 

The foregoing are merely suggested as pos- 
sible contributing factors to color and mood 
association, and the need for additional ex- 
Perimental work in this area is obvious. 

With regard to the present investigation, it 
would seem possible, and even likely, that in 
a similar experiment different results might 
be obtained if different shades of the same 
Colors were used. For instance, in a discus- 
Sion with a group of the subjects after the 
data had been collected, the writer men- 
tioned that she had expected purple to “go 
With” powerful. One of the subjects replied 
that the particular shade of purple was not 
deep and dark enough to be a “powerful” 
Purple. Thus it would appear that extreme 
Caution should be used in generalizing these 
findings to other shades of the same colors. 
However, because of the positive findings of 
this experiment, it would appear that useful 
information could be obtained by extending 
this type of investigation to other groups. 
Such information might possibly be of ex- 
tensive value to both industrial and clinical 
Psychologists. 

Summary 

In an attempt to determine to what degree 
Colors (hues) are associated with mood-tones, 
94 subjects were presented with eight stimu- 
Us colors (red, orange, yellow, green, blue, 
Purple, brown, and black) and a list of eleven 
Moods (exciting-stimulating; secure-comfort- 
able; distressed-disturbed-upset; tender-sooth- 
mg; protective-defending;  despondent-de- 


435 


jected-unhappy-melancholy; calm-peaceful- 
serene; dignified-stately; cheerful-jovial-joy- 
ful; defiant-contrary-hostile; and powerful- 
strong-masterful), the word selections of 
which had been unanimously agreed upon by 
four judges. No significant differences were 
found in color-mood association between male 
and female. It was found, however, that for 
each mood-tone certain colors were chosen to 
“go with” that mood-tone significantly more 
often than the remaining colors, and the re- 
sults were stated. 

Inasmuch as there was general agreement 
among studies concerning mood and color as- 
sociation, several possibilities for this were 
given, such being the influence of cultural, 
biological, and learning factors. 


Received December 3, 1953. 


References 


1. Birren, F. Color psychology and color therapy. 
New York: McGraw-Hill, 1950. 

2. Buck, J. N. Proceedings of the H-T-P work- 
shop. Richmond, Va.: Veterans Administra- 
tion Hospital, 1950. 

3. Chandler, A. R. Beauty and human nature. 
New York: Appleton-Century, 1934. 

4. Eisenhart, C., Hastay, M. W., and Wallis, W. A. 
Techniques of statistical analysis. New York: 
McGraw-Hill, 1947. 

5. Goldstein, K. The organism. 
can Book Co., 1939. 

6. Guilford, J. P. There is system in color pref- 
erences. J. opt. soc. Amer., 1940, 30, 455-459. 

7. Nair, K. R. The distribution of the extreme 
deviate from the sample mean and its stu- 
dentized form. Biometrika, 1948, 35, 118-144. 

8. Odbert, H. S., Karwoski, T. F., and Eckerson, 
A. B. Studies in synesthetic thinking: I. 
Musical and verbal associations of color and 
mood. J. gen. psychol., 1942, 26, 153-173. 

9. Risler, J. L’influence psychologique de la lumiere. 
(The psychological influence of light.) Cour. 
med., 1927, 77, 40-42. 

10. Rogers, Marian E. A study of color prefer- 
ences. Unpublished master’s thesis, Purdue 
University, 1950. 

11. Snedecor, G. W. Statistical methods (4th ed.). 
Ames: Iowa State College Press, 1946. 

12. Staples, R. and Walton, W. E. A study of 
pleasurable experiences as a factor in color 
preference. J. genet. psychol., 1933, 43, 217- 
223. 

13. Tatibana, Y. Color feelings of the Japanese. 
I. The inherent emotional effects of colors. 
Tohoku psychol. fol,, 1937, 5, 21-46. 

14. Tukey, J. W. Comparing individual means in 

the analysis of variance. Biometrics, 1949, 5, 
99-114. 


New York: Ameri- 


[OURNAL oF APPLIED PSYCHOLOGY 
va an No. 6, 1954 


Readability of Mathematical Tables * 


Miles A. Tinker 


University of Minnesota 


Casual examination of several mathematical 
or statistical tables will reveal great variation 
in the typographical arrangements employed. 
From table to table the reader may find 
variation in type size, type face, use of addi- 
tional leading at periodic intervals, number of 
decimal places employed, etc. To a reader 
with some background in scientific typogra- 
phy, it is obvious that some of these factors 
should influence the readability of the tables. 
Since a particular mathematical table may be 
put to a number of different uses by workers 
or students in a variety of scientific fields, it 
would seem that arbitrary choice of a specific 
typographical arrangement for use in a cer- 
tain field is not the most important factor to 
consider, particularly with tables of squares, 
cubes, square roots, and cube roots which are 
widely used. In some tables, economy of 
space seems to be the sole consideration with 
no attention to readability factors. Where 
readability (or legibility) is mentioned, as by 
Milne (3), choice of typography depends 
upon opinion rather than upon experimental 
findings. 

Actually, there has been no experimental 
work done on the readability of mathematical 
tables. A few related findings in specialized 


kinds of reading may be cited: Baird (2) 
studied the legibility of a telephone directory. 
He found that 4 


point leading between lines 
was 13 per cent more efficient in terms of 


time taken to find a number than when set 
solid. He also found that indenting every 
other line in the directory increased only 
slightly (probably not significantly) the speed 
and accuracy of locating telephone numbers 
in comparison with an even alignment of 
names. Scott (5) had subjects read two 
pages of a railroad time-table, each set up in 
light-faced small type and heavy-faced large 
type. The large heavy-faced type was read 
faster and with considerably fewer errors, 

* The writer is 


nesota Graduate 
this study. 


grateful to the University of Min- 
School for research grant to finance 


Size of type rather than heaviness of type 
face may have been the important meon 
After inspecting a number of mathematica 
tables, Babbage (1) expressed a preference 
for numerals of uniform height (moder) 
rather than those with ascenders and ert e 
ers (Old Style). In a report (4) of the 
Committee on Type Faces it is recommendec, 
on the basis of collected opinions, that moc 
ernized Old Style numerals be used in mathe- 
matical tables. Reading the Old Style nW 
merals is considered to produce less fatigue 
Milne (3) also considers the Old Style a 
merical symbols, in which most of the an 
acters have heads or tails, to be more leg! 
than those of uniform height. Tinker 
determined (a) the relative visibility i 
Modern and Old Style numerals by obtain 
ing the average distance from the eyes * 
which the numerals could be read correct 
and (b) the speed and accuracy of root 
the two kinds of numerals. The Old a 
numerals, read in isolation, were slightly P 
visible (probability at the 2 per cent leve be 
but in groups were much more visible Cit 
ability beyond the one per cent level). a3 
Modern numerals in groups were read saga 
fast and just as accurately under i 
reading conditions as the Old Style numer? 
It was suggested that when numerals old 
Printed in groups as in tables, that the er- 
Style numerals be used because they are P 
ceived more easily (more visible). ; 
The above citations merely suggest kin 
might be more satisfactory in terms of oi ex 
type, leading, and type style. Since n° pas 
perimenting with actual tabular materia . in 
been done, the need for some explorat of 
vestigation seems indicated. The puree m 
the present study is to investigate hes ji al 
parative readability of five mathem“ ib 
tables in terms of the speed with whic ‘ and 
jects can find the Squares, square roo ose? 
Cube roots of numbers. Tables were ©" yp? 
which permitted comparisons betwee” 


pat 


436 


. 5 el ——————————— E N 
ÃĂ— n y a  —t— 


Readability of Mathematical Tables 


sizes, type faces, and arrangement of columns 
and rows of numerals. 


Materials and Procedure 


The five published tables will be designated by 
the letters: A, B, C, D, and E. For purposes of 
comparison we will need a rather complete de- 
Scription of each table. Only tables which in- 
cluded squares, cubes, square roots and cube 
roots were used from each book (except Table A 
which did not include cubes and cube roots). 
Table 1 shows the columnar arrangements of the 

_ five mathematical tables. 

Table A has a 6 X 9 inch page. The numerals 
are printed in an 8 point Modern (all numerals 
same height) type set solid with successive groups 
of five entries down the columns separated by 8 

point leading. The first column (No.) is in bold 
face and the remaining numerals ordinary light- 
face. Decimals are carried to three places. In 
the square column, there is } pica space between 
each set of two numerals along a line. Columns 
are separated by a 1 pica space with no rule. 

_ The paper isa good quality mat white and thick 
enough so that shadows from print on the reverse 

side do not show through. 

Table B has a 58 X 84 inch page. The nu- 
Merals are printed in an 8 point Old Style type 
(ascenders and descenders) set solid with succes- 
Sive groups of five entries down the columns sepa- 
tated by 8 point leading. The first column (No.) 
18 in bold face and the remaining numerals in 
Ordinary face. Decimals are carried to seven 
Places,” In the square column, the numerals 

along a line are grouped in twos as in Table A, 

and by threes in the cube column. There is ł to 

Pica space plus a rule between various columns. 

he paper is good quality mat white and thick 
enough so that no shadows show through. 

Table C has a 3% X 63 inch page. The numer- 

als are printed in a 6 point Modern type, set 


437 


- solid with groups of 10 entries down the columns 


separated by 6 point leading. There is no bold 
face type in this table. Decimals are carried to 
four places. There are no groupings into twos 
or threes along lines in the squares and cubes. 
There is a 1 pica space with no rule between 
columns. The paper is a good quality mat white 
and thick enough so that no shadows show 
through. 

Table D has a 33 X 6% inch page. The numer- 
als are printed in a 6 point Modern type, set 
solid with no grouping of entries down the col- 
umns, but each fifth entry down a column is in 
bold face which is only a little darker than the 
rest of the printing. Decimals are carried to 7 
places in the square and cube roots. There are 
no groupings along lines into twos or threes in the 
squares and cubes. There is a } pica space plus 
a rule between columns. The mat grayish white 
paper is so thin that shadows from print on the 
reverse side show through enough to hinder dis- 
crimination of the numerals. The printed page 
impresses the reader as being crowded and diffi- 
cult to read. 

Table E has a 43 X 63 inch page. The numer- 
als are printed in a 6 point Modern type, set 
solid with groups of 10 entries down the columns 
separated by 6 point leading. Numerals in the 
No. column are in bold face. Square root deci- 
mals are carried to four places, cube root to five 
places. There are no groupings along lines into 
twos or threes in squares and cubes. There is a 
3 to 1 pica space between various columns plus 
a rule. The mat grayish paper is so thin that 
disturbing shadows show through from the print 
on the reverse side but these shadows are not as 
prominent as in Table D. 

The typographical arrangements of these five 
tables permit a number of interesting readability 
comparisons: type face, A vs. B; type size, A vs. 
C; leading between grouping of entries down col- 


l Table 1 
l Arrangement of Columns in Five Mathematical Tables 
, Table Columnar Arrangement 
Square Square Square 
A No Square Root No. Square Root No. Square Root 
Fourth 2 Square Cube nb 
B No. Square Cube Power VN Root Root N 
Square Cube 
c No. Square Cube Root Root Circum. Area No. 
Square Cube 
D No. Circum. Area Square Cube Root Root Reciprocal 
1000 Square Cube Circum. Area of 
E No. No. Square Cube Root Root of Circle Circle 


438 


umns, D vs. E; arrangement of columns, various; 
etc. All tables have 50 entries per column ex- 
cept D which has approximately 75. 

The experiment was carried out in a labora- 
tory room with uniform illumination of 20 foot- 
candles. The subjects were 120 university stu- 
Eon were two general procedures: 1. The 
book was opened to page one of the table and on 
presentation of the number, the subject found 
the entry, turning pages where necessary, and 
read off the response. 2. The book was opened 
to the page of the table containing the number 
involved. Upon presentation of the number, the 
subject found the entry and read off the re- 

onse. 

PThis was done separately for the finding of 
squares, square roots, and cube roots. There 
were, therefore, six parts to the experiment. 
Twenty different subjects observed on each part. 
Materials (the five tables) and subjects were 
systematically rotated within each part to equate 
practice effects. 

Ten numbers were looked up in each table by 
each subject. Upon arrival at the laboratory the 
subject was allowed to look over the five tables 
to become acquainted with them. He was then 
told that he would be presented with a number 
and that he was to look up and read off aloud 
the square (square root, cube root) as rapidly as 
possible. Two practice trials were given on each 
table just before it was used. The number to be 


looked up was presented typed on a 3 X 5 inch 
index card. The ten numbers to 


982). A different series of n 
each table. 


tenths of a second fr 


ber (uncovered on table before the subject) to 
the beginning of the spoken response. All errors 
were tabulated. No information about results 


was given to a subject until the experiment was 
completed, 


Results 


The basic data of this 
Table 2. Comparison of 
upper half of the table 
more time is taken to find squares, square 
roots and cube roots of a number when the 
book is opened at the beginning of the mathe- 
matical table (subject finds page) than when 
the book is opened to the page containing the 
item (subject given page). When mean scores 
for one-half of the subjects were compared 
with those for the other half the consistency 
of trends from mathematical table to table 
was high in each part of the experiment, 


study are given in 
the lower with the 
reveals that much 


Miles A. Tinker 


Table 2 


Mean Time in Seconds Taken to Locate Squares, 
Square Roots and Cube Roots in Five 
Mathematical Tables 
(N = 20 college students in each comparison, 120 in all) 


Squares Square Root Cube Root 
Table Mean S.D. Mean S.D. Mean S.D. 
Subject Finds Page 
A 5.18 61 5.04.49 = = 
eB 5.24.72 5.77.86 5.20 .87 
c S5 J 5.70.99 5.90 4 
D 6.34 91 6.43 1.04 6.06 so 
E 5.49 71 6.30 1.08 6.06 9 
Page Given Subject 
A 2.99 40 2.90 .37 Sp tee 
B 2.74.52 2.90  .66 aT “i 
c 2.74 54 2.92 62 2.96 t 
D 371 61 3.50.86 3.41 s 
E 3.02 66 3.06 69 2.90 + 


PẸ ; r 
Note: Original computations were carried to fou 
decimal places. 


Correlation of the ranks obtained from these 
Scores ranged from .80 to 1.00. Tabulation 
of the errors revealed a high percentage ° 
accuracy. In only two per cent of the ne 
responses were there errors. There was little 
difference in error count from table to table 
although they were somewhat fewer in mathe- 
matical Table A. We may, therefore, C0” 
centrate our attention on the speed scores: , 
In Table 3 are listed differences and al 
cal ratios for teading squares, square TO 
and cube roots of numbers when the subjec i- 
always started at page one of the mathemati 
cal table. In finding squares of ot 
starting always at the beginning of the mat ter 
matical table, times were significantly ele 
in Tables A, B, C and E than in D. ae 
less certainty, time for A was faster than ‘i 
Note also that A was better than E, “ne 
that B was better than C and E althoug? yas 
differences were not significant, There ¥ 
no important difference between A and 
C and E. (The whole pattern of differ 
will be coordinated below under discussion’ 
The data on significance of differences i 
finding square roots, starting at the begi?” 


ces 


ee 


Readability of Mathematical Tables 


439 


Table 3 


Differences Between Means in Seconds with Critical Ratios for Finding Squares, Square Roots and 
Cube Roots in Five Mathematical Tables When Subject Finds Page 


Squares Square Roots Cube Roots 
Tables 
Compared Diff. C.R. Diff. CR Diff. C.R. 
A vs. B + .06 0.29 + .73 382°" = om 
A vs. C + 34 1.65 + .66 2.68** = = 
A vs. D +1.16 4.74" +1.39 5.41** = = 
A vs. E + 31 1.48 +1.26 ATE = siey 
B vs. C + .28 1.25 — 07 0.24 +.70 2:13* 
B vs. D +1.10 4.24** + .66 2.19* +.86 2.40* 
B vs. E + .25 1.10 + .53 1.72 +.86 2.50* 
C vs. D + .82 3:19** + .73 2.27 +.16 0.45 
C vs. E — 03 0.13 + .60 1.84 +.16 0.47 
D vs. E — 85 3.29** — 13 0.39 .00 0.00 


* Significant at the 5 per cent level. 
** Significant at the 1 per cent level. 


of the mathematical tables, reveal that Table 
A was significantly superior to all other Ta- 
bles (B, C, D, E). With a lesser degree of 
significance (5 per cent level), B was better 
than D, and C than D. Also, B was better 
than E and C than E although the signifi- 
cance of the differences did not reach the 5 
per cent level. 

In locating cube roots, Table B was su- 
perior to C, D, and E. No other differences 
Were significant. 

Turning now to the situations in which the 


mathematical tables were opened by the ex- 
perimenter to the page containing the item to 
be located, the data on significance of differ- 
ences for finding squares, square roots and 
cube roots are given in Table 4. In finding 
squares, Tables A, B, C and E are much 
better than D. B and C are somewhat better 
than A although not significant at the 5 per 
cent level. There is no important difference 
between A and E, or B and C. 

The data for significance of differences for 
square roots with proper page given to the 


Table 4 


Differences Between Means in Seconds with Critical 


l Ratios for Finding Squares, Square Roots and 


Cube Roots in Five Mathematical Tables When Subject is Given Page 


Square Roots Cube Roots 


T Squares 

abl = 

Compared Diff. C.R. Diff. GR: Diff. CR. 
A vs. B —.25 1.71 .00 0.00 — = 
A vs. C —.25 1.66 +.02 0.12 — — 
A vs. D +.72 4.43** +.60 2.87** = — 
A vs. E +.03 0.17 +.16 0.92 — = 
B vs. C .00 0.00 +.02 0.10 +.19 0.83 
B vs. D 4.97 5.42** +.60 2.48* +.64 2.67** 
B vs. E 4.28 1.49 +.16 0.75 +13 0.59 
C vs. D 4.97 5.32** +.58 2.46" +.45 1.78 
C vs. E +.28 1.47 +.14 0.68 —.06 0.25 
D ys. E — 69 3.45** —.44 1.79 =i 2.03* 


* Significant at the 5 per cent level. 
** Significant at the 1 per cent level. 


440 Miles A. 
subject reveal that Tables A, B and C were 
much better than D. There were no other 
significant differences. y : 

"Date on significance of differences in look- 
ing up cube roots, page given, indicate that 
Tables B and E were considerably better than 
D. Other differences are unimportant. 


Discussion 


The results obtained when the subject al- 
ways started from the beginning page of the 
mathematical tables will be considered first. 
This is the kind of situation encountered by 
the reader in his customary use of tables of 
this kind. In practically every instance, a 
significantly greater time was required to lo- 
cate squares, square roots and cube roots in 
Table D than in the other tables. In Table 
D, the type was smaller than in A or B; in 
D there was no additional leading to separate 
groups of items down columns to aid in fol- 
lowing across rows in contrast to A, B, D and 
E; also, the paper was thinner in D than in 
A, B and C. It is possible th 
ment of columns in D retar 
proper entry since it wa: 
over the first two colum 
column. However, 
tant since an analo 
some entries in B 


at the arrange- 
ded finding the 
S necessary to skip 
ns next to the No. 
this may not be impor- 
gous situation occurs for 
and E. The fact that E is 
better than D for finding squares is probably 
due to the additional leading which separates 
groups of items down columns, 

_ Keeping in mind that in A and B the spac- 
ing and type size was the same (8 point), 
and that the Squares column was next to the 
No. column in both, the lack of difference 
finding squares Suggests that st 
face is unimportant (Modern 


in 
yle of type 
vs. Old Style). 


Square roots were next 
column), 
column, 
In Tables C and E, the type size, spacing 
down rows, arrangement of columns, and 
spacing between columns were alike or very 
similar. This probably explains the lack of 


difference in finding squares and in finding 
cube roots, 


to the squares (third 
but in B they were in the sixth 


Tinker 


Lack of difference between D and E in find- 
ing square roots and cube roots seems to be 
due to the fact that the typography is similar 
in both: thin paper, intercolumnar ae 
and rules, 6 point type, and arrangement 0 
columns. Reason for the slight superiority 
of E over D in finding squares is not clear. 
Perhaps column arrangement was a factor, for 
the squares were in the third column in D. í 

The factors which favor readability © 
mathematical tables, when the reader starts 
at the beginning of each table may be 
summed up as follows: Improved readability 
is achieved by larger type (8 vs. 6 point), by 
using at least 1 pica space between columns 
with no rules (rules probably do not lessen 
readability provided there is adequate space 
between columns, e.g., Table B), by separat- 
ing items down a column into groups of pr 
or ten by leading equivalent to the type si! 
used, by a favorable arrangement of conan 
across page (as No., square, square rA À 
cube, cube root when one is interested mainly 
in squares, cubes and roots), and by use A 
mat white paper thick enough to prevni 
shadows showing through from print on t 
reverse side. d- 

When the reader began at page one in fin 
ing squares and square roots, as discuss¢ 
above, the order of the mathematical tables 
from most to least readable was found to t 
A, B, C, E, D. Table A was by far the pa 
(errors as well as time) and D was by 4 
the least readable. Ji- 

When we consider readability tno 
cated by turning pages (given page), the P! 
ture is similar but not the same. Tables D. 
B, C and E are much more readable than A 
The main differences in typography comme? 
o A, B, C and E in contrast to D are T 
grouping of items down columns by insert” 
leading after every fifth or tenth entry, i 
more adequate spacing between columns. ve 
addition it should be noted that A and B i 
8 point type and are printed on thick paP 
in contrast to D which is in 6 point tyPe a 
thin paper. The slight superiority of B 4 B 
C over A is difficult to understand. Table jn 
is in 8, and C in 6 Point type while A pe 
8 point. The only typographical ites ak 
common to B and C in contrast to A is 


| 


| 


| 


Readability of Mathematical Tables 


in B and C there is only one set of columns 
present (50 No. items) per page while in A 
there are three sets of columns of 50 items 
each in the No., square, and square root col- 
umns, i.e., 150 successive items per page. It 
is possible that the need to locate the proper 
No. column as well as the numeral hindered 
the reader somewhat in Table A. It is un- 
likely that the Old Style numerals in B vs. a 
modern style in A was important although 
this should be noted. 

The lack of any difference between A and 

E must have a similar explanation. The ad- 
vantage of 8 point type in A vs. 6 point in E 
seems to be nullified by the need to identify 
the correct No. column as well as the desired 
numeral in A. The reason for lack of any 
difference between B (8 point) and C (6 
point) is not clear. Two typographical dif- 
ferences should be noted: 1. In C the columns 
are separated by a 1 pica space while in B 
they are separated by a 1 pica space plus a 
rule. 2. In C the column entries are grouped 
in tens while in B they are grouped by fives. 
Grouping by tens may have an advantage. 
It is possible that the intercolumnar spacing 
and the grouping by tens in C offset the ad- 
vantage of the larger type in B. 
_ The slight superiority of B over E must be 
in the quality of the paper (thick vs. thin) 
and size of type (8 vs. 6). And the slight 
Superiority of C over E may be due to quality 
of paper plus perhaps the space rather than 
rules between columns. 

There are fewer significant differences be- 
tween the mathematical tables for finding 
Square roots with page given. A, B, and C 
are better than D. The essential typographi- 
Cal differences between D and the other 
tables are lack of grouping down columns, 
and less adequate spacing between columns 
Plus perhaps quality of paper. 

The pattern of differences when looking up 
Cube roots is similar to that for square roots. 
Band E are better than D. Other differ- 
ences are not important. Apparently the 
Same typographical factors are operating as 
m looking up square roots. 

The above discussed factors which favor 
‘eadability of mathematical tables when the 
Teader is given the page on which the num- 


441 


ber appears (no turning of pages) may be 
summarized as follows: grouping of items 
down columns by inserting ample leading 
(grouping by tens may be better than by 
fives), use of one set of columns per page, 
use of at least a 1 pica space between columns 
without rules, use of white mat paper thick 
enough so that shadows from print on the re- 
verse side do not show through, plus perhaps 
size of type. Variation in style of type face 
seems unimportant. Apparently the sugges- 
tions by Milne (3), Tinker (6) and others 
(1, 4) that Old Style numerals should be 
more legible than a modern face when used in 
tables do not hold. 

When the page on which a number was to 
appear was given the reader in finding squares 
and square roots, the order of the mathemati- 
cal tables from most to least readable was 
(roughly) B, C, A, E, D. The difference be- 
tween B and C was small. 

This experiment was designed as a pre- 
liminary investigation of the readability of 
mathematical tables. Obviously, the experi- 
mental design is imperfect. There are too 
many variables involved. One variable at a 
time should be studied, or a design should be 
employed that permits isolation of the vari- 
ance due to each variable. Nevertheless, the 
present results suggest that the more im- 
portant typographical factors favoring good 
readability are use of at least 8 point type, a 
favorable arrangement of columns, at least 1 
pica space between columns without rules, 
ample leading to separate entries down col- 
umns into groups of five or ten, and paper 
thick enough to prevent shadows showing 
through from print on the reverse side of page. 


Summary 


1. The purpose of this experiment is to in- 
vestigate the influence of certain typographi- 
cal variations upon the readability of mathe- 
matical tables. 

2. Times in seconds to look up squares, 
square roots, and cube roots were obtained: 
(a) when subjects always started with page 
one of the tables; and (b) when the page 
containing the number sought was given. 

3. Twenty adult subjects served for each 
of the six parts to the experiment, 120 in all. 


442 Miles A. 

4. Five mathematical tables representing a 
wide range of typographical variations were 
used. 

5. The results of this experiment revealed 
certain typographical factors which promote 
more effective readability as well as certain 
conditions that should be avoided. The evi- 
dence educed here suggests the following as 
an effective typographical arrangement for 
mathematical tables: 


a. Do not crowd an excessive number of 
columns into a table. This is apt to occur in 
general purpose tables which include such 
things as reciprocals, areas, etc. in addition 
to squares, cubes and roots. 

b. Use only one set of columns per page 
with about 50 entries per column. 

c. Use at least 8 point type, either Old 
Style or a modern face. 

d. Employ generous leading to separate 
numerals into groups of five down columns, 
and then show grouping into tens by an un- 
derline below each tenth row or by some 
other technique which will be easily noted. 


e. Use at least 1 pica space between col- 
umns without rules. 


Tinker 


f. Use bold face printing in the No. col- 
umn. 

g. Use paper thick enough so that shadows 
from print on the reverse side of the page do 
not show through. 

h. Use mat white paper and jet black ink 
to assure maximum contrast between ink and 
paper. 


Received December 8, 1953. 


References 


1. Babbage, C. Tables of logarithms (Introduction). 
London: J. Mawman and Co., 1827. 

2. Baird, J. W. The legibility of a telephone direc- 
tory. J. appl. Psychol., 1917, 1, 30-37. 

- Milne, J. R. The arrangement of mathematical 
tables. Ed. by C. Knott, Napier Tercentenary 
Memorial Volume. London, 1915, 293-316- 

- Report of the committee appointed to select the 
best faces of type and modes of display for 
government printing. London: H. M. Sta- 
tionery Office, 1922. 

- Scott, W. D. The theory of advertising. BOS- 
ton: Small, Maynard and Co., 1903, 119-129. 

. Tinker, M. A. The relative legibility of Moder” 
and Old Style numerals. J. exper. Psychol» 
1930, 13, 453-461. 


w 


= 


on 


a 


Tue JOURNAL OF APPLIED PsycHoLocy 
Vol. 38, No. 6, 1954 


Comparative and Single Stimulus Methods in Determining 
Taste Preferences * 


James A. Bayton and Charles M. Thomas 


Howard University 


Research designed to ascertain relative pref- 
erences for variations of a food product is 
essentially research on the judgmental proc- 
ess. Because of this the investigator must 
draw upon psychophysics in deriving his 
methodology (4). The psychophysical meth- 
ods available fall into two general categories 
—the methods of comparative judgment and 
the method of single stimulus (sometimes re- 
ferred to as the method of absolute judg- 
ments). All methods of comparative judg- 
ment are alike in that they require the Ss to 
make direct comparisons of the items in one 
session. In the method of single stimulus the 
Ss judge an item without a specific compari- 
son stimulus being present. 

In food preference research using the 
method of comparative judgment two varia- 
tions are frequently employed—paired com- 
parisons and rank order. When the number 
of items being investigated is three, the 
method of paired comparisons has proved 
efficient (1). With four items the number of 
Pairings is too great for a taste test confined 
to one session per S; in this instance the 
method of rank order has been used (2, 3). 
It can be argued, however, that any compara- 
tive judgment procedure is not realistic from 
the standpoint of actual consumer behavior. 
How often does the consumer make compara- 
tive judgments of the type involved in paired 
comparisons or rank order experimental de- 
signs? The consumer’s situation, in which he 
uses the product without a comparison item 
Present, is a duplication of the method of 
Single stimulus. This being so, a critical 
question becomes—what is the nature of the 
Ordering of food items, with respect to pref- 
erence, by the two general procedures? The 
Present research is directed toward this ques- 
tion. 

1The authors are indebted to Dr. Forrest E. 


Clements, of the Bureau of Agricultural Economics, 
Or his assistance in this research. The canned orange 


tices were provided by the Florida Experiment Sta- 
lon and the Bureau of Agricultural Economics. 


443 


The food items used were four canned 
orange juices that varied in °Brix and in 
Brix-acid ratio. °Brix is a measure of the 
specific gravity of sugar solutions; the Brix- 
acid ratio is a measure of the relation be- 
tween the °Brix and the amount of acid in 
the solution. Changes in °Brix or Brix-acid 
ratio are correlated with changes in the tart- 
sweet quality of the orange juice; the higher 
the °Brix or Brix-acid ratio the sweeter the 
juices taste. 


Experimental Design 


Subjects and Materials. The Ss were 120 in- 
dividuals 17 years of age and over (68 men; 52 
women). The Ss were randomly assigned to the 
various experimental groups. Each S was tested 
individually. 

The four canned orange juices used were: I. 
13.2 °Brix; 18.3 Brix-acid ratio; II. 13.4 °Brix; 
9.9 Brix-acid ratio; III. 9.0 °Brix; 18.3 Brix-acid 
ratio; and IV. 9.3 °Brix; 9.9 Brix-acid ratio. 

Variables such as variety of orange and peel- 
oil content were constant. 

The juices were kept under refrigeration so 
that they were always chilled when used in a test. 
The juices were served in non-waxed paper cups. 


Procedure. One-half of the Ss first judged 
the four juices using the method of rank order 
and, after a drink of water and a rest period, 
rated one of the juices under single stimulus 
conditions. The other half of the Ss reversed 
this procedure. The particular juice judged 
under single stimulus conditions was assigned 
to the Ss in a random manner—one-fourth 
of the Ss judging a given juice in this par- 
ticular manner. 

The procedure for the method of rank order 
was as follows. The four juices were placed 
in a row in front of the S. The original po- 
sition of the juices varied randomly from S$ 
to S. The S tasted the juice on the extreme 
left and placed it in front of the other three. 
The juice now on the left in the original row 
was tasted and placed to the left or right of 
the first one in terms of, “I like this one 
better” (placed to right) or, “I like that one 


Add 


Table 1 
Rank Order Preferences for Four Canned Orange Juices 


13.2°Brix 13.4°Brix 9.0°Brix 9.3 “Brix 


18.3 Brix- 9.9 Brix- 18.3 Brix- 9.9 Brix- 

Rank acid acid acid acid 
order ratio ratio ratio ratio 
1 83 25 7 5 

2 18 52 34 16 

3 13 24 39 44 

4 6 19 40 55 
Mn 1.52 2.31 2.93 3.24 
N 120 120 120 120 


X = 228.36; d.f. = 9; P < 01. 


better” (placed to left). Each of the remain- 
ing juices was tasted and placed in the new 
row being developed. When the new order 
had been established the S took a sip of 
water and tasted the sequence again to verify 
his order of preference. He was permitted to 
change the order if he wished. The scoring 
was 1 to 4, starting with the most preferred 
juice—the one occupying the extreme right 
position. 

A rating scale was used for the method of 
single stimulus judgments. The scale was 
called a “Taste Thermometer,” The values 
ranged from 0 to 100 with gradations of five 
indicated and the tens numbered. Opposite 


James A. Bayton and Charles M. Thomas 


100 was the statement, “The best I have ever 
tasted”; opposite zero was, “The worst I have 
ever tasted.” At 50 was, “Fair; average. 
The space between 50 and 100 contained the 
statement, “Better than average” and between 
O and 50 the statement, “Poorer than aver- 
age.” The S was told that he would be given 
one juice to drink and that he would then 
give it a score. The Ss were not told that the 
juice was one of the four in the rank order 
procedure. 


Results 


The results with the rank order procedure 
are presented in Table 1. The mean rank for 
each juice is given but chi-square was used as 
the test of significance. The order of pref- 
erence was 13.2 °Brix; 18.3 Brix-acid ratio, 
13.4 °Brix; 9.9 Brix-acid ratio, 9.0 °Brix; 
18.3 Brix-acid ratio, and 9.3 °Brix; 9.9 Brix- 
acid ratio. Chi-square tests of all pairings 
revealed that each juice was significantly dif- 
ferent from the other juices. 

Table 2 gives the data for the rank orde! 
Procedure as a function of time of presenta- 
tion (before or after the single stimulus pro- 
cedure). In no instance were the rank orde" 
distributions for a juice significantly different 
between sessions, 

The analysis of variance results for the 
single stimulus data are shown in Table 


Table 2 
Rank Order Preferences for Four Canned Orange Juices by Time of Presentation 
13.2 Brix 13.4 °Brix 9.0 °Brix 9,3 °Brix 
18.3 Brix-acid 9.9 Brix-acid 18.3 Brix-acid 9.9 Brix-acid 
ratio ratio ratio ratio 
Before After Before After Bef Afto 
Rank single single single single single can ee single 5 
Order stimulus stimulus stimulus stimulus stimulus stimulus stimulus stimult 
1 38 45 i7 8 2 5 3 3 
2 11 7 24 28 16 18 9 i 
3 8 5 10 14 19 20 23 a 
4 3 3 9 10 23 17 25 ay 
Mn 1.60 1.43 2.18 2.30 3.05 2.82 3:17 3% 
N 60 60 60 60 60 60 60 s 
= 9597 X= 4.65 x? = 2.32 x = 0.995 
df. = 3 df. = 3 dhe 3 df. = 3 
P > 05 P> 05 P> 05 p> 05 


Stimulus Methods in Determining Taste Preferences 


445 


Table 3 


Analysis of Variance for Single Stimulus Ratings of Four Canned Orange Juices 


Sum of Mean 
Source of Variation Squares d.f. Square E 
Juices 4,895.00 3 1,631.67 4.669 (P < .01) 
Time of presentation 2,803.34 1 2,803.34 8.022 (P < .01) 
Juices X Time of presentation 1,028.34 3 342.78 
Within 39,136.99 112 349.44 
Total 47,863.67 119 


F for variance between juices was significant. 
Inspection of the mean ratings, however, re- 
vealed the following: 13.2 °Brix; 18.3 Brix- 
acid ratio and 13.4 °Brix; 9.9 Brix-acid ratio 
had means of 58.00 and 57.83, respectively. 
The means for 9.0 °Brix; 18.3 Brix-acid ratio 
and 9.3 °Brix; 9.9 Brix-acid ratio were 46.00 
and 43.50, respectively. The differences were 
significant only when °Brix varied. In other 
words, the two high °Brix juices were signifi- 
cantly different from the two low °Brix juices. 
Variation in Brix-acid ratio did not yield sig- 
nificant differences in preference. 

The variance between presentations was 
Significant. The mean rating for all juices, 
when the single stimulus procedure came first, 
was 46.50. The mean rating for all juices, 
when this procedure followed the rank order 
method, was 56.17. 


Discussion 


The above results show that the preference 
Pattern for four canned orange juices is a 
function of the experimental design used. 
When the Ss followed the rank order design 
both °Brix and Brix-acid ratio contributed 
to preference differentiation. With each S 
making only a single stimulus rating of one 
Juice (the four juices being randomly as- 
Signed to four such groups) preference differ- 
entiation occurred only in terms of °Brix. It 
will be noted that in both methods preference 
Was associated with the relatively higher 
°Brix (the sweeter juices). ; 

A frame of reference factor was found in 
the analysis of variance of the data obtained 
by the method of single stimulus. Starting 
‘cold” with this procedure produced rather 
Ow ratings for all juices. When this method 
followed the rank order procedure the mean 


ratings of the juices were appreciably higher. 

One limitation that must be placed upon 
these results rests in the fact that each S did 
not have experience with the four juices un- 
der single stimulus conditions. Another ex- 
periment, just completed, indicates that when 
such is the case no significant differences in 
mean ratings are obtained for juices that vary 
in Brix-acid ratio with °Brix held constant. 
This is the finding in the present experiment. 


Summary 


1. Preferences for four canned orange juices 
that varied in °Brix and in Brix-acid ratio 
were obtained by a method of comparative 
judgment procedure (rank order) and the 
method of single stimulus (using a rating 
scale). 

2. The rank order procedure produced pref- 
erence differences in terms of °Brix and of 
Brix-acid ratio. 

3. The single stimulus procedure produced 
differences only in terms of ° Brix. 

4. From both methods it appears that pref- 
erence is associated with juices of relatively 
higher °Brix (the sweeter juices). 


Received November 20, 1953. 


References 


1. Bayton, J. A. Consumer preferences for selected 
frozen concentrated apple juice. Bureau of 
Agricultural Economics, June, 1951. 

2. Bayton, J. A. and Bell, H. P. Discrimination 
tests and preliminary preference ratings of 
frozen concentrates for lemonade. Bureau of 
Agricultural Economics, September, 1952. 

3. Bell, H. P. and Bayton, J. A. Taste tests on 
canned orange juice. Bureau of Agricultural 
Economics, June, 1953. 

4. Clements, F. E. Psychophysical methods in mar- 
ket research. Florida State Horticultural So- 
ciety, 1951, 64, 148-153. 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. J No. 6, 1954 


Method of Single Stimulus Determinations of Taste Preference * 


Forrest E. Clements, James A. Bayton, and Hugh P. Bell 
United States Department of Agriculture, Washington, D. C. 


There are two fundamental considerations 
that would lead one to select a method of 
single stimulus approach in research on pref- 
erences for variations of a food product. 
First, the method of single stimulus is a 
duplication of the situation that is typical for 
consumers. Seldom does the consumer have 
available in his home at a given time several 
variations of a particular food product—the 
situation that would be conducive to making 
preference judgments based upon the direct 
comparisons involved in the method of com- 
parative judgments. Realistic research on 
taste preference should attempt to utilize the 
actual home situation. 

The second factor that will force the ex- 
perimental design into a method of single 
stimulus model is the number of the varja- 
tions of the food product being tested since 
adaptation is a variable. It has been our ex- 
perience, in either laboratory or home situa- 
tions, that three items is the maximum num- 
ber for a paired comparison design when test- 
ing occurs in one session; with a rank order 
design four items is the maximum (1, 2, 3, 4, 
5). When the number of items is five or 
more comparative judgment models should be 
abandoned for a method of single stimulus 
design. 
Of necessity, any method of single stimulus 
design for determining taste preference will 
require the use of some type of rating scale 
or scoring system. Most scales used in such 
research are either point-scales with terse defi- 
nitions of each point or “thermometers” 
lowing for 0 to 100 scoring with 
such as “Excellent,” “Good,” 
points. When designing a r 
that will involve a large 
cross-section of consumer: 


al- 
descriptions 
etc., at selected 
esearch project 
-scale sample of a 
s the scale or scor- 


1The canned orange juices used 
ment were furnished by the Florida Experiment 
Station. The authors wish to acknowledge the ef- 
forts of Mrs. Motier Fisher and Mrs. Mary George 
Robinson who did the necessary field work, Discus- 
sions with Dr. Franklin R. Kilpatrick led to our de- 
velopment of Scale B 


in this experi- 


446 


ing system used will have to be easily under- 
stood. Because of this, a third type of scale 
was investigated in the present experiment. 
This was a highly unstructured scale with 
only the extremes of the continuum defined. 
None of the points available for choice as ex- 
pressing degree of preference was identified 
or defined. The primary purpose of this ex- 
periment was to test the relative efficiency of 
the three types of scales in determining taste 
preferences when the research is conducted 
with a method of single stimulus design un- 
der realistic conditions. 

The particular food items used were we 
canned orange juices that varied in Brix-acl 
ratio with °Brix held constant. °Brix is 4 
measure of the specific gravity of sugar solu- 
tions; Brix-acid ratio is a measure of the r€- 
lation between the °Brix and the amount of 
acid in the solution. The lower Brix-act® 
ratios are characterized by tart-sour taste; 
the higher Brix-acid ratios are sweeter 17 
taste. In addition, when °Brix is constant 
and amount of acid varies the juices change 
in body or consistency; the higher Brix-ac! 
ratios tend to be “thinner” as well as sweete™ 

This research was preliminary to a il 
scale investigation involving six canned orang 
juices. To facilitate this preliminary projec 
the lowest and highest Brix-acid ratios in the 


six juices and one from the middle set We 
used. 


Procedure 


4th 

Scales. Scale A ranged from 0 to 100 Ae 

gradations of five numbered. To the right of Jed- 
scale certain areas were bracketed and labe? 


s 

he area 90-100 was “Excellent”; 70-90 Ta 

“Very Good”; 50-70 was “Good”; 30-50 ery 

“Fair”; 10-30 was “Poor”; 0-10 was cide 

Poor.” The Ss were instructed first to e 

what they thought of the juice in a general rate 
—Very Good,” “Poor,” etc.—and then to 


it by assigning a score in the particular are? p 

Scale B consisted of ten 5/16" squares yas 
ranged vertically. Above the top square, very 
“Excellent”; below the bottom one was ate 
Poor.” No other definitions or descriptive 5 


Single Stimulus Determinations of Taste Preference 


ments were given. The Ss were shown that their 
opinion about the juice could be expressed as 
falling anywhere from “Very Poor” up through 
“Excellent.” They were to put a cross in the 
square that expressed their opinion. Scoring for 
Scale B was 1 to 10, starting with the bottom 
square. 
Scale C was the following 7-point scale: 


Excellent—the best canned orange juice I have 
ever tasted. 

Good—much better than other canned orange 
juice I have tasted; but not the best. 

Fair—a little better than other canned orange 
juice J have tasted; but not much better. 

Borderline—can’t decide whether it is better 
or worse than other canned orange juice I 
have tasted. 

Poor—a little worse than other canned orange 
juice I have tasted; but not much worse. 
Very Poor—much worse than other canned 

orange juice I have tasted; but not the worst. 
Objectionable—the worst canned orange juice 
I have ever tasted. 


The Ss were instructed to check the square 
preceding the statement that expressed their opin- 
ion about the juice. The scoring was 1 to 7, 
Starting with “Objectionable.” 

Descriptive Check-List. After rating a juice 
the Ss were asked to check those items on a list 
that they thought described it. They could check 
as many items as they thought applied. The 
check-list contained items such as, “Too sweet,” 
a tart or sour,” “Just the right sweetness,” 
etc, 

Experimental Designs. The Ss were adult 
members of households in a new residential area 
adjacent to Alexandria, Virginia. The area con- 
sisted of about 600 homes. Approximately every 
seventh home was contacted, yielding a panel of 
90 households. To be eligible a household had 
to contain at least two adults who agreed to 
participate throughout the experiment. 

Test I. The purpose of Test I was to obtain 
preference ratings for the three canned orange 
Juices on each of the three scales. The 90 house- 
holds were divided into three sets of 30 each. 
Each set received a given scale. On the first 
Placement of the juices 10 households using a 
given scale received 12 Brix-acid ratio, 10 re- 
ceived 16 Brix-acid ratio, and 10 received 22 
Brix-acid ratio. These assignments were made 
in a random manner, Three placements were 
made per household until each S had experience 
with all of the juices. Three to four days in- 
tervened between placements. The homemakers 
Were instructed to place the juice in the re- 
frigerator overnight and to make the tests the 
following day. 

The juices were in unlabeled 10-ounce cans, 
Coded for identification purposes. Only one can 
of juice was left at a household per placement. 


447 


This was done so that all the juice would be con- 
sumed when tested. As stated, at least two Ss 
per household made the tests. 

Test II. The purpose of Test II was to in- 
vestigate the reproducibility of the ratings for 
the 12 Brix-acid ratio juice on each scale. Fif- 
teen of the 30 households that worked with a 
given juice in Test I were selected. The Ss were 
told merely that they were rating “another” juice 
in our set. This test was conducted about one 
month after the completion of Test I. 

Test III. The purpose of Test III was to in- 
vestigate the reproducibility of the preference re- 
lationship obtained with Scale B in Test I for 
the 12 and 22 Brix-acid ratio juices. This test 
took place approximately two months after Test 
I. The 30 households that had worked with 
Scale B in Test I took part in this test. One- 
half of the households received the 12 Brix-acid 
ratio juice on the first placement, the remainder 
received 22 Brix-acid ratio on the second place- 
ment; after three days, the juices were reversed. 


Results 


The mean preference ratings on Scale A for 
the 12, 16, and 22 Brix-acid ratio juices were 
61.0, 60.1, and 57.4, respectively. The dif- 
ferences were not significant. Scale B yielded 
mean ratings for the respective juices of 5.7, 
5.8, and 5.8. On Scale C the means were 5.3, 
5.1, and 5.0. Neither of the latter two scales 
produced significant differences between the 
juices. 

Table I presents the preference data in 
terms of whether an S scored the 12 Brix-acid 
ratio juice above or below the mean for that 
juice on a given scale. Those who scored the 
juice above the mean were designated as the 
“Like” group; those scoring it below the mean 
were called the “Dislike” group. On each 
scale, the “Like” 12 Brix-acid ratio group 
gave a significantly higher rating to that juice 
than to either the 16 or the 22 Brix-acid ratio 
juices. In these groups, however, the ratings 
for the latter two juices were not significantly 
different. Conversely, on each scale, the 
“Dislike” 12 Brix-acid ratio group tended to 
give higher ratings to the 16 and 22 Brix- 
acid ratio juices than to the 12 Brix-acid 
ratio juice. These differences were not sig- 
nificant for Scale A. In Scale C, the differ- 
ence between 12 and 16 Brix-acid ratio was 
significant; the difference between 12 and 22 
Brix-acid ratio was not’ significant. Both of 
the differences involved were significant on 


448 


F. E. Clements, J. A. Bayton, and H. P. Bell 


Table 1 


i in Brix-acid Ratio by 
for Canned Orange Juices that Vary in 
i oe te i “Disliking” Brix-acid Ratio 12 (Test I) 


Brix-acid ratio (12 °Brix) 


t for mean difference 


12 16 22 12vs.16 12vs.22 16 vs. 22 
“Like” Brix-acid Ratio 12 
Sala Mid <a 76.7 66.8 61.3 3.24 4.1 13 
D 7.0 16.0 17.7 
Scale B (N = 32) 75 5.5 5.7 4.0** 3.5** 0.3 
a 11 2.5 2.6 
= pe oa 63 5.2 5.2 5.0** 44 0.2 
sa 0.5 12 15 
“Dislike” Brix-acid Ratio 12 
N = 29 
see We : 44.8 53.2 53.3 2.0 1.8 0.1 
SD 11.4 18.5 19.3 
B (N = 25) 
ee R 3.4 6.2 6.1 5.4** 4.1** 0.4 
SD 1.4 1.9 24 
le C (N = 25) 
ye TE 4.0 5.0 46 3.0" 1.3 0.8 
SD 1.2 1.2 1.8 


** Significant at the 1 per cent level. 


Scale B. For the “Dislike” 12 Brix-acid 
ratio group, none of the 16-22 Brix-acid ratio 
differences were significant, 

A similar analysis was made with the 16 
Brix-acid ratio juice. The general pattern 
was that when this particular juice was 
“liked” its mean was higher than those for 
either the 12 or 22 Brix-acid ratio juices. 
The respective differences involved were sig- 
nificant on each scale. When the 16 Brix- 
acid ratio juice was “disliked” its mean was 
lower than those for the 12 or 22 Brix-acid 
ratios. Both of the differences involved were 
significant only for Scales B and C, 

Table 2 repeats the above analysis in 
terms of “Like” and “Dislike” 22 Brix-acid 
ratio for each scale. The pattern in this in- 
stance was for those who “liked” the 22 Brix- 
acid ratio juice to give the other two juices 
lower scores. Both of the particular differ- 
ences involved were significant on Scales A 
and B. Those who “disliked” the 22 Brix- 


acid ratio juice gave the other two juices 


higher scores than observed for the 22 pi 
acid ratio. Both differences were significan 
on each scale, f- 
The data on reproducability of the pre 
erence ratings for the 12 Brix-acid ratio ma 
after a month had passed, showed that fo 
each scale the difference in preference rating 
was not significant. f- 
The results on reproducability of the pre 
erence data on Scale B for the 12 and 
Brix-acid ratio juices two months after Ter 
I demonstrated that the mean preference io 2 
ings for the two juices again were not signi 
cantly different, However, the division of t é 
Ss into “liking” and “disliking” a respectiv? 
juice revealed a pattern similar to that ° 
tained in the original analysis. Those Ta 
“liked” a given juice tended to give the Olas 
one lower ratings; when g juice was at- 
liked” the other juice was given higher e 
ings. The difference was not significant, BO 


ie 
ever, for those who “disliked” the 12 B 
acid ratio juice, 


: D = 
I 


eS 


Single Stimulus Determinations of Taste Preference 449 
Table 2 
Preference Scores for Canned Orange Juices that Vary in Brix-acid Ratio by 
“Liking” vs. “Disliking” Brix-acid Ratio 22 (Test I) 
Brix-acid ratio (12 °Brix) ł for mean difference 
12 16 22 12 vs.16 12vs.22 16 vs. 22 
“Like” Brix-acid Ratio 22 
Scale A (N = 30) 
Mn 62.2 64.3 73.0 0.5 2.9** 24* 
SD 18.7 18.8 10.5 
Scale B (N = 32) 
Mn 5.4 64 7.8 1.5 4.4** 3g 
SD 2.5 21 1.3 
Scale C (N = 44) 
Mn 55 4.9 5.8 1.8 1.2 3.9%* 
sD 1.5 1.3 0.8 
“Dislike” Brix-acid Ratio 22 
Scale A (N = 29) 
Mn 59.8 55.9 41.2 1.0 4.6** aot 
sD 19.2 18.9 10.7 
Scale B (N = 25) 
Mn 6.1 5.0 3.4 1.7 3.3% 3.0% 
SD 2.2 2.4 i2 
Scale C (N = 16) 
Mn 4.9 5.4 2.6 1.7 5i5** 8.8** 
SD 1.3 1.0 1.1 


* Significant at the 5 per cent level. 
** Significant at the 1 per cent level. 


Table 3 


Descriptions of Canned Orange Juices that 
Vary in Brix-acid Ratio 


Brix-acid ratio 


(12 °Brix) 
Description 12 16 22 

Per Per Per 

cent cent cent 
Too tart or sour 39 I H 
Too sweet ii Bi 28 
Too thin or watery 32 AT. 35 
Too artificial 25 31 30 
Just the right sweetness 35 44 38 
Just the right tartness orsourness 30 18 17 


oes not taste like fresh orange 
„„ Juice, but still is pretty good 51 55 53 
astes like fresh orange juice 11 9 7 


Number 175 175 175 


Note: Percentages add to more than 100 because 
Some Ss checked more than one descriptive item. 


The descriptions of the three canned orange 
juices by all Ss, regardless of scale used, are 
presented in Table 3. The percentage of Ss 
who described the juices as being too tart or 
sour decreased from the 12 to the 22 Brix- 
acid ratio. The percentage of Ss describing a 
juice as too sweet increased from the 12 to 
the 22 Brix-acid ratio. “Too thin or watery” 
was most frequently given for the 22 Brix- 
acid ratio juice. Approximately 50 per cent 
of the Ss said that although these juices did 
not taste like fresh orange juice they still 
were “pretty good.” 

In Table 4 the descriptions have been ana- 
lyzed in terms of those “liking” or “disliking” 
the 12 and 22 Brix-acid ratio juices. Those 
who “disliked” the 12 Brix-acid ratio juice 
tended to describe it as too tart or sour and 
too artificial. The Ss who “disliked” the 22 
Brix-acid ratio juice tended to say it was too 
sweet, too thin or watery, and too artificial, 
Approximately 50 per cent of those who liked 


450 


F. E. Clements, J. A. Bayton, and H. P. Bell 


Table 4 


i y “Liking” vs. “Disliking” Brix-acid Ratio 12 
ipti d Orange Juices by “Liking” vs. Disliking 
A en ee and Brixacid Ratio 22 (12 °Brix) 


12 22 . 
Brix-acid ratio Brix-acid ratio 
Rada AP 
Description “Like” “Dislike” “Like” ‘Dislike’ ; 
Per cent Per cent Per cent — 
56 8 
Too tart or sour a > r n 
Too sweet 0 1 e 
Too thin or watery 5 aA 5 A 
Too artificial k: e 9 = 
Just the right sweetness a f 
Just the right tartness or sourness - 44 10 
Does not taste like fresh orange juice, ki se 
but still is pretty good , 74 28 1 f 
Tastes like fresh orange juice 17 1 
81 
Number 100 78 100 


Note: Percentages add to more than 100 because some Ss checked more than one descriptive item. 


a juice said it had “just the right sweetness.” 
“Just the right tartness or sourness” was 
more frequently used to describe the 12 and 
the 22 Brix-acid ratio among those who 
“liked” these respective juices. 


Discussion 


Test I can be viewed as three independent 
experiments on preferences for these canned 
orange juices; each experiment involving a 
different scale. The general pattern of the 
preference results was similar for each scale. 
Regardless of the scale, the means of the 
preference ratings, for all Ss, were not sig- 
nificantly different. It had been expected 
that the 12 Brix-acid ratio would be too tart 
and the 22 Brix-acid ratio too sweet, thus 
producing the highest mean preference rat- 
ings for the 16 Brix-acid ratio juice. This 
expectation was based upon the assumption 
that we would be dealing with a sample from 
one population. Taking the means for all Ss 
per juice at their face value one would con- 
clude that one juice was as likely to be pre- 
ferred as another. This, in turn, would raise 
the question of whether the Ss really could 
distinguish between the three juices in this 
method of single stimulus approach although 
prior comparative judgment experiments have 
shown that these juices are discriminable (3). 


The division of the Ss into “liking” or ei 
liking” a given juice gave evidence that, wit 
respect to canned orange juices, there are tw 
basic populations that we sampled. One 
population likes a tart juice; the other likes 
a sweet juice. Those who “liked” a tart 
juice gave it a relatively high score and gave 
a low score to the sweeter juice. The Ss wh? 
“disliked” the tart juice gave it a low scot 
and assigned a relatively high score to the 
Sweeter juice. Obviously, this phenomeno® 
produced a cancelling-out effect on the means 
per juice for all Ss. This effect is particu- 
larly striking since the results were obtaine 
with a method of single stimulus experimen” 
tal design, 

That this phenomenon is no artifact is seen 
in its demonstration in Test I with three dif- 
ferent scales, Further evidence is seen in e 
replication of the experiment (Test IIL) afte 
two months, using the 12 and 22 Brix-acl” 
ratio juices. Once again, the difference bee 
tween the juices, for all Ss, was not sign) 
cant. However, analysis in terms of ae 
and “disliking” showed that the Ss Wia 
“liked” one juice scored it relatively high : 
Contrast to the score given the other juice 

The data from the descriptive check- i5. 
show that the Ss were responding to the cll” 
cal variables in these juices (tartness-swee 


Single Stimulus Determinations of Taste Preference 


ness and body or consistency). Furthermore, 
the data from all Ss yield additional support 
for the conclusion that two populations were 
involved. The percentage describing a juice 
as too tart or sour decreased from the 12 
through 22 Brix-acid ratio juices. The re- 
verse was true for those calling these juices 
too sweet. When the descriptions were con- 
sidered in terms of “liking” or “disliking” 
one of these juices they revealed that the Ss 
were responding to the tart-sweet dichotomy. 

The Ss were asked whether they liked or- 
ange juice “somewhat on the tart side or 
somewhat on the sweet side.” Forty-seven 
per cent said they liked it tart, 46 per cent 
replied “sweet,” and 7 per cent volunteered 
the information that they liked it “medium,” 
“in-between,” etc. Although the distribution 
of these replies again supports the two-popu- 
lation concept they were not indicative of 
how the juices were scored. There was only 
low correlation between the replies to this 
question and the preference scores for the 
juices. This can only mean that the ques- 
tion does not locate those Ss who actually 
prefer tartness or sweetness under direct ex- 
perience with the juices. 

The question now becomes whether there 
was any difference in the efficiency of the 
three scales in revealing the preference pat- 
tern. It has been pointed out that the pref- 
erence pattern was similar for the three 
scales. Inspection of the t’s for the “Like”- 
“Dislike” data shows that Scale B tended to 
give higher significance values than the other 
two scales. The median t for Scale A was 
2.00, for Scale B was 3.04, and for Scale C 
was 2.51. 

The reproducability test with the 12 Brix- 
acid ratio juice did not produce significantly 
different ratings on any scale. However, it 
should be noted that Scale B came closer to 
doing this than did the other two scales. 

There is an indication that as the Ss con- 
tinued to work with these juices the prefer- 
ence ratings tended to rise. In Test I the 
means for all Ss for the 12 and 22 Brix-acid 
ratio were 5.70 and 5.84, respectively. In 
Test III the respective means were 6.73 and 
6.17. This supports the prior finding that 


451 


repeated experience with the juices, under 
single stimulus conditions, produces generally 
higher ratings (5). In spite of this general 
increase in preference the “Like’”-‘Dislike” 
patterns still existed. 

On the basis of the results of this experi- 
ment it was decided to use Scale B in a 720 
household study of preferences for six canned 
orange juices that vary in Brix-acid ratio, 
with °Brix constant. Scale B seems to be 
somewhat more efficient in revealing prefer- 
ence patterns and has the advantage of mini- 
mizing language and intellectual difficulties. 


Summary 


1. Using a method of single stimulus de- 
sign three canned orange juices that varied in 
tartness-sweetness and in body or consistency 
were given preference ratings. Three differ- 
ent scales were used, each S working with 
only one scale. 

2. Under method of single stimulus condi- 
tions the Ss were able to respond to the vari- 
ables of tartness-sweetness and body or con- 
sistency. 

3. The results indicated that there are two 
populations with respect to preference for 
canned orange juice—one prefers a tart juice, 
the other a sweet one. 

4. A relatively unstructured scale, with 
only the ends of the continuum defined, 
tended to be most efficient. All three scales, 
however, revealed the same pattern of prefer- 
ence. 


Received December 12, 1953. 


References 


m 


.- Bayton, J. A. Consumer preferences for selected 
frozen concentrated apple juice. Bur. agri. 
Econ., June, 1951. 

2. Bayton, J. A. and Bell, H. P. Discrimination 
tests and preliminary preference ratings of 
frozen concentrates for lemonade. Bur. agri. 
Econ., September, 1952. 

3. Bell, H. P. and Bayton, J. A. Taste tests on 
canned orange juice. Bur. agri. Econ., June, 
1953. 

4. Clements, F. E. Psychophysical methods in mar- 
ket research. Fla. hort. Soc., 1951, 64, 118- 
153. 

. Thomas, C. M. Comparative vs. single stimulus 

methods in determining taste preferences. Un- 
published master’s thesis, Howard Univ., 1952. 


in 


HE JOURNAL OF APPLIED PSYCHOLOGY 
van 3 No. 6, 1954 


The Effect on Performance of Tilting the Toll-Operator’s Keyset * 


Edythe M. Scales 
Bell Telephone Laboratories Inc., Murray Hill, New Jersey 


and 


Alphonse Chapanis 


Department of Psychology, Th 


Although engineers and human engineers 
frequently recommend the use of inclined 
visual displays and panels on control con- 
soles, there is virtually no scientific evidence 
to show that this design practice has any 
measurable effect on operator efficiency (2). 
The present study was undertaken to dis- 
cover whether tilting the keyset now used by 
long-distance operators would have any effect 
on their keying performance. In view of the 
lack of experimental evidence in thi 
we believe that our findin 
eral interest. 


s area, 
gs may be of gen- 


Experimental Method 


Apparatus. The long- 


distance operator's key- 
set is a ten-button set w 


ith the numbers and let- 


oo 

80 
0o 
OG 


Fic. 1. Top view of a toll-operator’s keyset. 


* The experiment reported here was done at the 
Bell Telephone Laboratories, 


452 


e Johns Hopkins University 


ters arranged in two vertical rows of five. A 
third column contains two keys, the KP key, 
which sets up the apparatus to receive a number 
sequence, and the ST key, which clears the ma- 
chine of the sequence just keyed (see Figure 1). 
The keyset is normally mounted on a horizontal 
working surface about 13 inches back from the 
front edge of this surface, and 9 inches to the 
right of center. The experimental apparatus 
shown in Figure 2 approximates the toll-opera- 
tor’s position in its essential dimensions. I” 
normal operation, the keyset is horizontal. Jn 
the present study, the keyset was mounted on 
hinges so that it could be inclined at eight angles 
relative to the working surface: 0, 5, 10, 15, 20; 
25, 30, and 40 degrees, 

A remotely-located re 
corresponding to the te: 
the keyset. 

The illumination of th 
constant throu 
tensity, 

Materials, 


corder printed numbers 
n number-letter keys © 


€ experimental room Was 
ghout the test at an adequate in 


The stimuli for the keying task 
were ten-place number and letter combinations 0 
the following form: 3 digits, space, 2 letters 

digit, space, 4 digits. In long-distance operation 
the first three digits are the “code” to the dis: 
tant location, the two letters and the next digit 
give the subscriber’s exchange, and the remaining 
four digits the subscriber’s number. For thes 
tests, the numbers were obtained from a table 
The letters were also 8° 
a special table which er 
combinations appeare 
at the letters Q and to 
timuli were presented t 
Twenty-four differens 
muli, were used in 3} 
nt (practice sessions. 
containing 100 stimu’ 
part (test sessions). 


: he experiment was 
vided into two consecutive parts, practice session 
and test Sessions, each part extending over C18 y 

ays. Two pairs of 8 by g Latin squares (fore 
in all) were used, the main-effect variables °° 
each being: (1) subjects; (2) days; and (3) 
clinations of the keyset. One pair of identic 
atin squares : 


was assigned to the practice 8°” 


—_—_—_—_—_—_—_—_——— 


C] oo 


Effect on Performance of Tilting the Toll-Operator’s Keyset 


Fic. 2. A schematic illustration of the toll-operator’s 
position in this experiment. 


sions; the other pair of identical Latin squares 
to the test sessions. One square of each pair was 
assigned to 8 male subjects and the other to 8 
female subjects. 

Procedure. Instructions read to each subject 
at the beginning of the first practice session cov- 
ered these essential points: 


1. Position of the subject with respect to the 
keyset. 

2. Technique of keying. (All keying was done 
with the first or second finger of the right 
hand.) 

3. The criterion, (Primary emphasis was placed 
on accuracy.) 

4. Procedure to follow when an error was dis- 
covered. 


Because this last instruction required the subject 
to stop and rekey the entire number whenever he 
thought he made an error, the time and error 
measurements are not entirely independent. 
Later on, however, we will see that this is not 
an important consideration. ; 

On each day of the practice sessions, all sub- 
jects keyed three number lists with a 5-minute 
Test between lists. On each day of the test ses- 
Sions, all subjects keyed two number lists with a 
10-minute rest between lists. 

_ Subjects. Sixteen subjects, eight male and 
eight female, participated in this study. Their 
ages were between 18 and 35. No subject had 
Previous experience on this keyset. One female 
Subject did not participate on the last day of the 
€st sessions, ` 


453 


Results 


All data expressed in the following graphs 
have been computed from individual error 
and time values. An individual error value 
is the percentage of incorrect keyings made 
by a subject, based on keying 150 numbers 
in a practice session or 200 numbers in a test 
session. An individual time value is the to- 
tal time required for a subject to key the 
three number lists in a practice session or the 
two number lists in a test session. 

For this kind of experimental design, an 
analysis of variance is usually employed to 
evaluate the data. Although such analyses 
were carried out in the present study, the re- 
sults are much more clearly described in the 
accompanying graphs. All graphs depict 
three statistical measures: (1) the arithmetic 
mean; (2) the mean plus and minus one 
standard deviation; and (3) the total range 
of values. 

Efect of Tilt. Figures 3 and 4 clearly 


32 


a i E ee Ge 


28 PRACTICE 


24 


20 


N ō 


PER CENT ERROR 
© 


o na L 
T ea | a: T 


TEST 
{ 
5 


INCLINATION OF KEYSET IN DEGREES 


A 


o 


T 
o 10 15 20 25 30 35 40 

Fic. 3. The data at each inclination are based on 
the percentage of errors made by 16 subjects each of 
whom keyed 150 numbers (practice sessions) or 200 
numbers (test sessions). (At 25° for the test ses- 
sions there were only 15 subjects.) The short hori- 
zontal line is the mean, the solid vertical bar the 
mean plus and minus one standard deviation, the 
thin vertical bar the range of individual error per- 
centages. 


48 a oe er eae ae | 
PRACTICE 


N 
EN 
TT 


TIME IN MINUTES 
N 
o 
TT 
E. 


b 


ho 
Ir 


D 
o 
4 
m 
a 
4 
l 


L 


w 
a 


w 
N 
, 


v 
© 


24 J 


20 bi Yh 
o 5 10 158 20 25 30 35 a0 


INCLINATION OF KEYSET IN DEGREES 


Fic. 4. These time data correspond to the error 
data of Figure 3. The basic datum is the total time 
required by each subject to key a set of numbers. 
The mean, standard deviation, and range are repre- 
sented as in Figure 3. The arrow at 25° shows the 
mean estimated for 16 subjects (see text). 


demonstrate that inclination of the keyset has 
virtually no effect on keying performance, 
either in terms of error or time. Figure 3, 
for example, shows that the averages for the 
test sessions are within a small range: 2.5 to 
3.7 per cent. Moreover, there is no evidence 
of any systematic trend in the mean values 
as a function of keyset inclination, A straight 
line with zero slope appears to fit these data 
adequately. It is not likely that the data are 
appreciably affected by the fact that one sub- 
ject was not tested at the 25-degree inclina- 
tion. 

Figure 4 also shows that the average times 
lie within a small range. For the practice 
sessions this range is 1.5 minutes (27.3 to 
28.8 minutes). For the test sessions the 
range is 1.1 minutes (31.1 to 32.3 minutes) 
provided that the estimated value for 25 de- 
grees is used. Since the subject who was not 


Edythe M. Scales and Alphonse Chapanis 


LETETT EF EJ F] 
29 PRACTICE TEST 

20 J 
5 
FEG ] 
w 
Pie | 
a 
x N 
«cs 
wi 
a 

oLlio 

o 2 4 6 8 10 «6120 14 

DAY OF TEST 


Fic. 5. The error data for each day are based Hae 
the performance of 16 subjects, except that on à 
last day there were only 15 subjects. The pes 
standard deviation, and range are represented as si 
Figure 3. The curves through successive means wer" 
drawn by inspection. 


tested at 25 degrees (Subject E, Figure 8) 
had the longest average keying time, the 
mean for 25 degrees is undoubtedly too low 
because of this omission. The arrow in Fig- 
ure 4 shows the estimated value for the meai 
on the assumption that Subject E had turne 
in a value equal to her average keying timê. 
Learning. Figures 5 and 6 show the course 
of learning in terms of errors and time, nee 
spectively. Both show a large and signifi- 
cant decrease due to learning throughout the 
first eight days of test. Errors do not show 
a significant decline in the second eight days 


48 


PRACTICE 


w 
a 


TIME IN MINUTES 
È w 


S 
A 


16 
12345678 91012131415 


DAY OF TEST 


5 error 
Fic. 6. These time data correspond to the. oñ 


data in Figure 5. The Mean, standard dev phe 
and range are represented as in Figure 3 


curves through successive means were drawn 
spection, 


jn- 


Effect on Performance of Tilting the Toll-Operator’s Keyset 


to ang Ge) CS Ct Gn Va 
FEMALE MALE 

« 

o 8 

æ 

© 

hg 

a 

zZ 

& 

fs] 

a 

w 

a 


o nN > 
—a 
rh 
—— 
Ed oe 
2——— 
A bes a 


Fic. 7. These error data show each subject's per- 
formance for the eight test sessions. (The data for 
E are based on only seven test sessions.) The mean, 
standard deviation, and range are represented as in 
Figure 3. 


although the keying times do. It is apparent, 
therefore, that learning was not complete even 
after 16 days of test. In this experiment 
the effects attributable to learning are much 
greater than those produced by variations in 
the inclination of the keyset. 

Individual Differences. Figures 7 and 8 
are plots of individual keying performances 
in terms of error and time for only the test 
sessions. These graphs show the most im- 
portant source of variance in our experiment. 
The averages in Figure 7 cover a range from 
0.5 to 6.6 per cent. In Figure 8 the range is 
from 24.8 to 39.7 minutes. 

Rank-order correlation coefficients between 
average errors and average times for the test 
sessions were computed for the male and fe- 
male subjects separately. For the female 
Subjects, the coefficient was + 0.07; for the 


sa Cid hb © ee 


i 
| 
Ha, Hh 


E CGADBHF PLONJMKI 
SUBJECTS 


Fic. 8. These time data correspond to the error 
data in Figure 7. The mean, standard deviation, 
and range are represented as in Figure 3. 


a E 
FEMALE 


a 
o 


w 
© 


| 


w 
N 


TIME IN MINUTES 
v 
© 


N 
ES 


n 
ke] 


455 


male subjects, — 0.76. We have no explana- 
tion for the difference between the magnitudes 
of these two correlation coefficients. 


Discussion 


In the specific work situation of the pres- 
ent study, we have found performance to be 
unaffected by the inclination of the working 
surface. However, spontaneous comments 
from all of our subjects indicated that they 
preferred an inclined keyset surface to a hori- 
zontal one. Furthermore, about half of the 
subjects expressed a preference for a keyset 
inclination between 15 and 25 degrees. 

These subjective preferences, as well as the 
quantitative data, are in agreement with an- 
other specific investigation that was concerned 
with speed and accuracy of target indication 
on a radar which was mounted at various in- 
clinations (1). Since the nature of the tasks 
in these two situations differs so radically, 
the agreement between the two sets of re- 
sults suggests that we can perhaps apply the 
findings to other work situations. If a work- 
ing surface is clearly visible to the operator 
and if it is within easy reach, inclining the 
work surface will probably not result in any 
measurable effect on performance. People 
seem to like inclined surfaces better than 
horizontal ones, but we have no way of evalu- 
ating the importance of such preferences. 

Many of the standard deviations in Fig- 
ures 2 through 7 are large because the data 
are not homogeneous, i.e., they include sev- 
eral sources of variance. For example, the 
standard deviations for each keyset inclina- 
tion in Figures 3 and 4 include the differences 
between subjects and the differences between 
days, both of which are large. In Figures 5 
and 6, the standard deviations include differ- 
ences between subjects and between inclina- 
tions. In this case the standard deviations 
are smaller because, as we have seen, varia- 
tions produced by keyset inclination are small. 
In Figures 7 and 8, the standard deviations 
are small because the variations attributable 
to inclinations and days (for the test sessions 
only) are also small. 

Earlier we noted that the time and error 
scores are not independent. If this were an 


456 


appreciable factor in this experiment, we 
should expect the two values to be positively 
correlated. Actually they are not. In addi- 
tion, we should note that in the test sessions 
there were only a few errors committed and, 
of those made, less than one in three was de- 
tected and rekeyed by the subject. All in 
all, therefore, we do not believe that this is 
an important consideration in these data. 


Summary 


The present experiment investigated two 
measures of keying performance, accuracy 
and time, as a function of inclination of the 
keyset. The keyset was inclined at eight 
angles, 0, 5, 10, 15, 20, 25, 30, and 40 de- 
grees, relative to the working surface. 

The test was divided into two parts, prac- 
tice sessions and test sessions. The subject's 
task was to key lists of ten-place number and 
letter combinations. Eight by eight Latin 
squares were used, the principal variables be- 


Edythe M. Scales and Alphonse Chapanis 


ing subjects, days, and inclinations of the key- 
set. The results clearly demonstrate that: 


1. Keying accuracy and keying time are 
independent of the inclination of the keyset- 

2. Both accuracy and speed increased sig- 
nificantly throughout the sixteen days of test. 

3. The greatest source of variation in this 
experiment is that produced by differences 
between subjects. 


Received August 12, 1954, 
Early publication. 


References 


1. Leyzorek, M. Mounting angle of a VJ remote 
radar indicator and its effect on operator per- 
formance. Special Devices Center, Office of 
Naval Research, Report No. 166-I-41, Feb- 
Tuary 1948, 

2. Stellar, E. Human factors in panel design. Chap- 
ter 6 in Panel on Psychology and Physiology: 
Committee on Undersea Warfare. A survey 
report on human factors in undersea warfare. 
Washington, D. C.: National Research Coun- 
cil, 1949. Pp. 153-175. 


THE JOURNAL or APPLIED PSYCHOLOGY 
Vol. 38, No. 6, 1954 


The Use of a Joy-Stick in Making Settings on a Simulated Scope Face * 


William Leroy Jenkins and A. Charles Karr 


Lehigh University 


An earlier study ' reported the use of levers 
in making settings on a linear scale. The 
most important variable proved to be the 
ratio between the movement of the lever tip 
(L) and the movement of the pointer (P). 
An L/P ratio of approximately three was 
found to be optimal. The current investiga- 
tion extends the problem into two dimensions, 
using a joy-stick to set a cursor on a simu- 
lated scope face. 

An operational diagram of the apparatus 
is shown in Figure 1. A vertical twelve-inch 
aluminum disc, with its center at approxi- 
mately eye-level and about 24” from the sub- 
ject’s eyes, simulates a scope face. Seven 
quarter-inch circular lucite inserts are spaced 
around a ten-inch diameter, six inserts around 
a seven-inch diameter, and four around a 
three-inch diameter. The cursor (a brass 
disc .150” in diameter) is controlled by a joy- 
stick placed between the subject’s knees with 
its tip about six inches below the edge of the 
simulated scope. 

Right-left components of the joy-stick move- 
ment are transmitted through the lower shafts 
to the small pulley and then to the upper 
shaft, causing the long cylinder to move right 
and left across the simulated scope face. 
Various ratios of movement between joy- 
stick and upper shaft are obtained by shift- 
ing the belt attachments along the bar at the 
end of the upper shaft. 

Front-back components of the joy-stick 
Movement operate a hydraulic pump that 
Serves to move the piston up and down in the 
long cylinder. Ratios between movement of 
the joy-stick and movement of the piston are 
changed by sliding the attachment of the 
hydraulic pump up or down on the joy-stick. 

* This research was executed under Contract AF 
18(600)-24 between the Institute of Research, Le- 
high University, and the USAF Wright Air Develop- 
ment Center, Aero Medical Laboratory, Wright-Pat- 
terson Air Force Base, Dayton, Ohio. 

1 Jenkins, W. L. and Olson, M. W. The use of 
evers in making settings on a linear scale. J. appl. 


Psychol., 1952, 36, 269-271. Also USAF Technical 
Report No. 6563, 


Since the viscous friction of the right-left 
system is less than that of the front-back 
system, it is necessary to equalize the kines- 
thetic feel by adding viscous friction to the 
right-left system. This is done by adjusting 
a Prony brake, liberally coated with graphite 
lubricant, which adds a viscous drag, until the 
right-left viscous friction seems equal to the 
front-back friction. 

The cursor and scoring mechanism are 
mounted at the top of the piston that moves 
up and down in the long cylinder. The scor- 
ing mechanism operates as follows: When the 
subject has completed a setting he pushes 
a switch which discharges a condenser into 
a small electromagnet. The electromagnet 
moves the lucite strip bearing the brass 
cursor, so that the brass disc comes in con- 
tact with the scope face for a fraction of a 
second. If the cursor touches only the lucite 
insert, no electrical contact is made and a 
green light glows. If the cursor is not en- 
tirely within the confines of the insert, elec- 
trical contact is made between the brass disc 
and the aluminum scope face, lighting a red 
light to indicate a mis-setting. 


Procedure 


The procedure for a single setting is as 
follows: Following a ready signal, the ex- 
perimenter moves a switch that simultane- 
ously lights one of the inserts and starts the 
timing clock. The subject moves the joy- 
stick to bring the cursor onto the lighted in- 
sert, and then pushes a button that simul- 
taneously operates the scoring mechanism and 
stops the timing clock. The elapsed time on 
the clock shows the setting time, and a green 
or a red light indicates whether the setting is 
correct. 


Results 


For clarity, the results will be described in 
five parts, paralleling the chronological order 
of the experiments. 


457 


458 


Aluminum disc 
with lucite inserts 


Adjustable 
belt attachment 


Piston 


Cylinder 


William Leroy Jenkins and A. Charles Karr 


Cursor and 
scoring mechanism 


Prony brake 


(J Hydraulic pump 


—— Joy-stick 


Adjustable 
pump attachment 


Fic. 1. Operational Diagram of Joy-Stick Apparatus. 


L/P Ratio in General. By combining vari- 
ous lever lengths and apparatus settings, 
twelve L/P ratios between 1.0 and 3.9 were 
tested, with nine target positions. Each of 
20 subjects made 20 settings at each com- 
bination of L/P ratio and target position. 

Table 1 shows for the twelve L/P ratios 
the mean setting time, variability, and mis- 
settings. In all three respects, ratios of 2.0 
and above appear to be clearly more favor- 
able than the ratios of 1.7 and under. Al- 
though the highest L/P ratios are obtained 
with the longer levers, each of the longer 
levers is also represented among the lower 
(unfavorable) L/P ratios. 

When the data are re-analyzed to compare 
the favorable ratios (2.0 and up) with the 
unfavorable ratios (1.7 and down) for each 
of the 20 subjects individually, in all 20 sub- 
jects, there is a saving in setting time with 
the favorable ratios. In 19 of the 20 sub- 
jects, there is likewise a decrease in vari- 
ability, and in 18 of the 20 subjects a de- 
crease in mis-settings. 

When the data are analyzed according to 
the nine target positions, at each target po- 
sition there is a saving in setting time, a de- 
crease in variability, and a reduction in mis- 
settings with the favorable ratios. 


Diferent Lever Lengths with Favorable 
L/P Ratios. The aim of the next set of e% 
periments was two-fold: to determine whether 
lever-length as such was significant within the 
favorable L/P ratios, and to see whether there 


Table 1 
Each Value is the Mean of 3,600 Settings 
(20 Subjects x 9 Target Positions 
X 20 Settings) 


Mean Mean ige 

L/P Lever Setting Variability Mis s 
Ratio Length Time (rmsofo’s)* setting 
10 12” 2.58 sec, 0.80 sec. 74% 
10 18" 2.48 sec. 0.70 sec. 6.6% 
L4 24" 2.20 sec. 0.54 sec. 48% 
L6 12” 2.23 sec. 0.58 sec. 5.0% 
17 30" = 218sec. 54sec. 39% 
20 18"  2olses, a sec. 43% 
20 24" = 2.02 sec. 0.44 sec. an 
2.5 30"  199sec. 0.40 sec. pe 
2.7 24” 1.93 sec. 0.37 sec. A 
3.1 24" 192 sec. 0.35 sec. pee 
3.4 30” 1.94 sec. 0.33 sec. 

3.9 30” 1.95 sec. 0.34 sec. a 
S 
* rms is the square root of the mean of the square 


of the standard deviations, ie., 


TOE ee 
o Ho... G2 
p E D A 

n 


—=>=$=$_$_$_— ——————— 


Use of Joy-Stick in Making Settings on Scope Face 


was any indication of an optimal ratio within 
the favorable region. Accordingly, lever 
lengths of 12, 18, 24, and 30 inches were em- 
ployed with apparatus settings to give L/P 
ratios of 2.0, 2.5, and 3.0 (except that it was 
not possible to reach an L/P ratio of 3.0 with 
the 12” lever in the present apparatus). Tar- 
get positions were restricted to the four on 
the three-inch diameter and the six on the 
seven-inch diameter. Each of 19 subjects 
made 20 settings at each of 10 positions using 
each of the 11 lever-ratio combinations. 

Table 2 shows the findings. It is evident 
that lever-length as such plays little or no 
part in the outcome. However, the L/P 
ratio of 2.5 is slightly superior to 2.0 in set- 
ting time, variability, and mis-settings, and 
inferior to 3.0 only in mis-settings. For con- 
venience the L/P ratio of 2.5, being lower, 
will be considered optimal. 

Starting Positions. Up to this point the 
starting position of the cursor was always 
at the bottom of the simulated scope face. 
The question was raised whether this was the 
best starting position. In the next series of 
experiments, five starting positions were used: 
top, bottom, right, left, and center of the ten- 
inch diameter circle on which the outer seven 
inserts were located. Each of 12 subjects 
made 20 settings at each of 17 target posi- 


Table 2 


Lever Length and Optimal L/P Ratio 


Note: Each value is the mean of 3,800 settings (19 
subjects X 10 target positions X 20 settings). 


Mean Mean 
L/P Lever Setting Variability Mis- 
Ratio Length Time (rms ofo’s) settings 
2.0 ig 1.71 sec. 0.32 sec. 2.2% 
18” 1.66 sec. 0.33 sec. 2.4% 
24” 1.64 sec. 0.31 sec. 24% 
30” 1.72 sec. 0.35 sec. 2.2% 
25 12” 4.63 see. 0.28 sec. 2.2% 
18” 1.59 sec. 0.26 sec. 2.1% 
24” 1,59 sec. 0.27 sec. 2.3% 
30” 1.63 sec. 0.30 sec. 2.0% 
3.0 pR” 
18” 1.60 sec. 0.26 sec. 14% 
24” 1.57 sec. 0.25 sec. 1.7% 
30” 1.59 sec. 0.25 sec. 1.4% 


459 
Table 3 
Influence of Starting Position on Performance 
L/P ratio = 2.5 Lever = 24” 


Note: Each value is the mean of 4,080 settings (12 
subjects X 17 target positions X 20 settings). 


Mean Mean . 
Starting Setting Variability Mis- 
Position Time (rms of o’s) settings 
Top 2.08 sec. 0.38 sec. 2.2% 
Bottom 1.99 sec. 0.39 sec. 2.9% 
Right 2.01 sec. 0.38 sec. 1.9% 
Left 2.01 sec. 0.40 sec. 2.6% 
Center 1.88 sec. 0.36 sec. 3.2% 


tions (including the seven on the ten-inch 
diameter), using the L/P ratio of 2.5, and 
starting from each of the five positions. Av- 
erage travel distance was thus the same for 
all starting positions. 

Table 3 shows the results. In terms of 
mean setting time and variability the center 
position is slightly superior. On the other 
hand, in percentage of mis-settings the center 
starting position is the worst of the five. 

In another analysis of the same data, the 
best starting position for each subject was de- 
termined in terms of each of the three cri- 
teria. In setting time, the center position is 
best for eight out of twelve subjects. In 
variability, the center position is best for five 
subjects. But in mis-settings no position 
stands out as being best. In overall view, it 
seems that starting position is relatively un- 
important, 

Reversed Front-Back Operation. In nor- 
mal operation, the cursor moved upward 
when the joy-stick was pushed away from the 
subject and downward when the joy-stick was 
pulled toward the subject. A question was 
raised concerning the effect on the optimal 
L/P ratio if this operation was reversed so 
that the cursor moved upward when the joy- 
stick was pulled toward the subject and vice 
versa. 

Each of 17 subjects made 10 settings at 
each of the 10 inner target positions with 
each of five ratios (Trials 1-10), using the 
normal direction of operation. He then made 
40 settings at each of the 10 target positions 
with each of five ratios (Trials 11-20, 21- 


460 William Leroy Jenkins 


30, 31—40, and 41-50), using the reversed di- 
rection of operation. Finally he made an- 
other 10 settings at each of the 10 target po- 
sitions with each of the five ratios (Trials 
51-60). 

Table 4 shows by blocks of 10 trials the 
results in mean setting time, mean variability, 
and mis-settings. Two points can be noted: 
First, by the end of 40 trials with reversed 
operation (comprising 2,000 individual set- 
tings per subject) performance with reversed 
operation approached performance with direct 
operation, indicating that the subjects learned 
to handle what they all called an unnatural 
relationship of joy-stick and cursor move- 


and A. Charles Karr 


ment. Second, for both conditions, an L/P 
ratio of 2.5 is the lowest that can be called 
optimal. 

Subject’s Switch. In all the studies just 
described, the subject’s switch was held in 
the hand that was not operating the joy- 
stick. A question was raised as to whether 
other types of switching would affect the per- 
formance. Two other types of switches were 
added: A push-button was located at the top 
of the upper end of the joy-stick, operating 
with very light pressure. A foot-pedal, with 
enough spring resistance to bear the weight 
of the subject’s foot, was placed at a con- 
venient position on the floor. 


Table 4 


Performance with Reversed Direction of Operation of Joy- 


stick and Cursor 


Note: Each value is the mean of 1,700 settings. 


(17 subjects X 10 target positions X 10 settings) 
Lever Length 24” 


Mean Setting Ti 


me (seconds) 


L/P Ratios 
Trial Nos. Operation 14 1.9 2.2 2.5 3.0 
1-10 (Direct) (1.81) (1.65) (1.68) ) 
: ; 1.66 (1.64 
11-20 Reversed 2.16 2.05 2.09 S 2.03 
21-30 Reversed 2.03 1.89 1.86 1.91 1.88 
so Reversed 1.95 1.82 1.80 1.78 1.80 
To Reversed 1.84 1.74 1.72 1.69 1.72 
(Direct) (172) (L58) (58) (156) (1.52) 
Mean Variability (rms of o’s in sec.) 
L/P Ratios 
Trial Nos. Operation 1.4 1.9 2.2 2.5 3.0 
1-10 (Direct) 
: (38) (0.29) (0.30) 0.28 0.25) 
1 ro Reversed 0.49 0.44 0.48 aa Wai 
21-30 Reversed 0.42 0.34 0.34 0.35 0.33 
mo La 0.38 0.33 0.32 0.31 0.32 
E = 0.36 0.30 0.30 0.31 0.29 
(032) (0.25) (025) 25) (023) 
Mis-settings (percentage) 
' L/P Ratios 
Trial Nos. Operation 1.4 1.9 22. 3.0 
3 ; f 2.5 5 
1-10 (Direct) 11.5 
11-20 Reversed Te (9%) (7.2%) (6.7%) (6.6%) 
21-30 Reversed 8.6 fe g1 a i 
31-40 Reversed 6.9 ae 76 ane is 
41-50 Reversed 6.9 a = 20 7 


51-60 (Direct) 


(6.9) 


Use of Joy-Stick in Making Settings on Scope Face 


Table 5 


Influence of Type of Switch on Performance 


Note: Each figure is the mean of 5,100 settings (10 
subjects X 17 positions X 30 settings). 


Ratio 2.5, Lever Length 24” 


Mean Mean 
Type of Setting Variability Mis- 
Switch Time (rms ofo’s) settings 
Other hand 1.46 sec. 0.19 sec. 8.9% 
Joy-stick tip 1.47 sec. 0.19 sec. 10.1% 
Foot pedal 1.47 sec. 0.19 sec. 8.2% 


Each of 10 subjects made 30 settings at 
each of 17 target positions with each of the 
three types of switches. A 24” lever and an 
L/P ratio of 2.5 were employed throughout. 

Table 5 shows the results. It is evident 
that all three types of switches are about 
equal in terms of mean setting time, mean 
variability, and mis-settings. Apparently any 
one of these three types of switches, which- 
ever is most convenient, can be used without 
affecting performance. 


Summary 


A series of experiments was performed to 
determine the significance of certain variables 
in the use of a joy-stick to make settings in 
two dimensions on a simulated scope face to 
a relatively coarse tolerance. 


461 


The most significant factor turns out to be 
the ratio between the movement of the joy- 
stick tip and the movement of the cursor. 
The lowest ratio that can be considered opti- 
mal is about two-and-a-half. That is, the 
tip of the joy-stick should move two-and-a- 
half times as fast as the cursor. 

Other variables proved to be relatively un- 
important. Joy-stick lengths of 12”, 18”, 24”, 
and 30” are equally effective. Starting posi- 
tion (top, bottom, right, left, or center of the 
scope) makes little difference in the overall 
results. Reversed operation (cursor moving 
down when stick is pushed away from the 
operator) is slower but the optimal ratio is 
the same. Finally, results are not affected by 
the position of the subject’s switch, whether 
it is operated by the hand not holding the 
joy-stick, by a foot-pedal, or by the same 
hand that moves the joy-stick. 

It should be emphasized that these results 
were obtained in a situation where the move- 
ment of the joy-stick is translated directly 
into movement of the cursor. The present 
type of apparatus does not permit making 
tests of a similar nature with joy-stick con- 
trols where the movement of the pointer is 
determined by pressure rather than by ex- 
tent of movement of the joy-stick. 


Received November 27, 1953. 


JOURNAL OF APPLIED PSYCHOLOGY 
var J No. 6, 1954 


Figure and Ground in a Two Dimensional Display * 


R. C. Browne 


The Nufield Department of Industrial Health, University of Durham, 
King’s College, Newcastle upon Tyne 


In an aircraft the subjective feelings of 
passenger or pilot are little guide to the atti- 
tude which the machine assumes, and they 
are even less guide when it is flying in the 
dark or in cloud when there is nothing to 
which external reference can be made. An 
indicator (Figure 1) was therefore developed 
to provide the pilot with a visible display of 
how the attitude of his aircraft varies in re- 
lation to an artificial horizon which js gyro- 
scopically stabilized. It shows in two dimen- 
sions whether the aircraft is climbing, diving, 
or banking and also, on a scale, the amount 
of bank, in degrees from the horizontal. This 
display provides a “figure and ground” prob- 
lem in that it is the horizon which apparently 
moves and not the aircraft. This does not, of 
course, accord with the facts, although it does 
with the appearance of the horizon as seen 
from the aircraft. Because of this, it was 
thought likely that air pilots often made 
wrong control movements and so increased 
the departure of their machine from the 
straight and level attitude. To meet these 


as designed jn 
t moved in ref- 
- The problem 


An initial examination of the tw 
showed that they differed in thre 

1. In the old method 
“figure” or miniature ai 
whereas in the new (F 
against a “ground” 
which is still. 


2. The old display is provided with a scale 
and pointer which shows how many degrees 
the aircraft is banking to one side or the 
other, but in the reversed sense. 


o displays 
e respects: 
(Figure 1D) the 
rcraft is stationary, 
igure 1A) it moves 
composed of a horizon 


* Acknowledgments are due to the Medical Direc- 
torate, British Royal Air Force, for permission to 


publish this paper, and to Mr. H, Campbell, B.A., 
FSS. for statistical advice, 


3. The old instrument is less heavily 
“damped” than the new; in other words, the 
new display takes rather longer to come to 
rest after a given deflection. 


Method 


The classical method of studying a display 
problem is with a tachistoscope. But, on the 
other hand, where machinery is controlled in re- 
sponse to alterations in an indicator (as in tne 
present study) and some movement in a contro 
system has to be made, it is perhaps better to 
assess the different displays in a comparable vi 
by requiring the experimental subjects to me 
control movements in response to changes 1 
them, and to measure the speed and accuracy 
with which they do so. % 

A standard instrument flying trainer was, there 
fore, used as the machine to be controlled, an 
it was fitted with a recording apparatus wae 
integrated the speed and accuracy with o 
deflections from the straight and level attitu! 
were corrected. It gave a numerical score pa 
two minutes. The test lasted for eleven minute 
which allowed time for four such scores to 


onds. The ta 
periment was 
which were co 


are s so 
indicators. The hood of the trainer was shut, 
that no fixed external reference point cou 
seen, and j 


turning movements, 


at random from a large group who had already 
air pilots and who piec $ 
geneous, were the su dh j 
But they were at a stag 


ad had no experience in 


been selected to be 
therefore, quite homo 
of the experiment. 


th ti- 
training when they h Š 


on the 
Fic. 1. The two displays. The New (A) is °” 


left and the Old (D) on the right. 


462 


Figure and Ground in Two Dimensional Display 


Table 1 


The Numbers in Each Group of Subjects and the 
Order in Which They Were Tested on 
the Various Displays 


Display Subjects 
Order 
of Test No. of Group Letter 
1, D 20 a 
A 20 b 
2 D 10 c 
A 10 d 
A 10 c 
D 10 d 
Le 20 & 
B 20 f 
Trained Air Pilots 
5 D 10 i 
A 10 h 
A 10 g 
D 10 h 


tude display indicators. In this way, bias due to 
familiarity with either display was avoided. Ev- 
ery man received a comparable explanation, and 
was allowed to practice until he could just do the 
test without damaging the apparatus. The test 
was kept short—eleven minutes—to avoid fatigue 
and fluctuating levels of attention, 

The experiment was divided into five parts as 
shown in Table 1. 

1. Two groups of 20 subjects (a) and (b) 
were chosen. Group (a) was tested on the old 
indicator (D) and Group (b) on the new indi- 
cator (A). 

2. Twenty new subjects were chosen and di- 
vided into two groups of ten (c) and (d). 
Group (c) was first tested on the old indicator 
(D) and then on the new indicator (A). Group 
(d) carried out the same two tests in the reverse 
order, 

„3. The old indicator (D) was partially covered 
with black paper to make it comparable to (A) 
every respect except the figure and ground re- 
ee and the degree of damping. A new group 
of 20 subjects (e) was tested on this display (C): 
ea pe display (C) was further modified so 

at the damping was comparable to (A) and an- 
other group of 29 subjects (f) was tested on this 
send display (B), (B) now resembled (A) in 
tion respect except the figure and ground rela- 


5. Two groups of ten experienced pilots (g) 
and (h) who had trained on the old display (D9 
aa chosen. Group (g) was first tested on the 
indi indicator (D) and afterwards on the new 

icator (A). Group (h) carried out the same 


463 


two tests in the reverse order. For this experi- 
ment the damping was made comparable on the 
two displays. 


Results 


Figure 2 plots the means of the four scores 
for every subject for the roll or side-to-side 
movements, and Figure 3 shows similar scores 
for the pitch or fore and aft movements. In 
parts 1 and 2 of the experiment 30 subjects 
(groups a and c, Table 1) had their first test 
on the old display (D) and another compa- 
rable 30 (groups b and d) on the new (A). 
It was found that 5.0 fewer errors were made 
in roll on (A) than on (D) which seems un- 
likely to be due to chance, since t = 2.75 and 
P=1 in 100. In pitch the difference is 
smaller (1.5 errors) as, indeed, were the dis- 
turbances in attitude to be corrected. But 
here too, fewer errors are made with the new 
display (A). Taken alone, this might be due 
to chance, but the difference is in the same 
direction as the difference in roll which lends, 
therefore, a certain weight to it. The instruc- 
tion times needed before the subjects were fit 
to start the test and their preferences for the 
two displays, are shown in Table 2. These 
two criteria were measured for 20 of the 30 
men in each group, and show that significantly 
less instruction time was needed in the case 
of the new display (17 compared to 23.5 min- 
utes) which was also subjectively preferred 
by between six and seven times as many of 
the men (33 compared to 5) who were tested 
upon it. 

The results with displays (C) and (B) in 
roll with fresh groups of subjects fall into in- 
termediate positions between the other two 


Table 2 


The Length of Instruction Time Needed Before the 
Test Could be Started and the Subjective 
Preferences for the Two Displays 


Display 
Instruction Time 
Minutes Old New Neither 
Mean 23.5 17.0 
Difference 6.5 + 1.84 
Number of Men with 
Preference for 5 33 2 


464 R: G: 


i 
ERRORS IN ROLL 


Fic. 2. Each subject’s individual scores in roll and 


the means for the groups. 


as, indeed, did the displays themselyes. In 
pitch (Figure 3), however, there was little to 
choose between the results given by the dif- 
ferent designs. 

The relations between the means and stand- 
ard deviations of these figures on the four 
types of display are of some interest, and 
they are shown in Table 3 and Figure 4. In 
the roll dimension, as the display becomes 
easier to interpret through the sequence D, 
C, B, and A, and errors fall from 29.7 to 24.7, 
so the scatter between subjects falls also from 
8.5 to 5.1. But the scatter within a given 
subject’s performance remains much more 
constant at between 3.3 and 4.3 errors. The 
figures for pitch demonstrate the same trend 
less markedly. As the test becomes harder it 
magnifies the individual differences, but it 
does not appear to make the performance of 
a single man more erratic. 

Ten subjects (groups c and d, Table 1) 
were tested upon each of the two displays 


Browne 


. 
20 i a 30 
ERRORS IN PITCH 


— 
15 


10 


Fic. 3. Each subject’s individual scores in pitch and 
the means for the groups. 


after previous experience with the other, to 
satisfy the desire for a “double” experimental 
design and to investigate the question of 
transfer. Errors were again fewer with dis- 
play (A) when first test was compared with 
first, and second with second (Table 4)- 
There is positive transfer from test to test 
(Table 5), whichever of the two came first, 
but with a difference in degree. Previous €x- 
Perience with display (D) stood the subjects 
in better stead than previous experience wit 
(A) and this was more marked in the rol 
dimension in which the disturbances to be 
corrected were the greater, The positive 
transfer from (D) to (A) in roll was four 
times as great as in the reverse direction, 2 
in pitch it was in the ratio of 1.6:1, This B 
to be expected if the new display (A) 1 
easier to read than the old (D), and it makes 
the point that in this type of problem it § 
unsafe, in designing the experiment, to 45° 


Dimension 
a Pitch 
Standard Deviation Standard Deviation 
Mean Bet aoe a 
Diit Emos Men” Yibin Emas Pee Mon 
New A 24.7 5.1 3 7 
4 .9 0. gs 
Old B 25.2 5.6 3.5 ioe ry 31 
Old C 28.4 76 a $ a 
Old D 29.7 85 20.7 5.8 


43 22.1 6.7 ue 


| 


Figure and Ground in Two Dimensional Display 465 
9-0 
x ROLL x 
© PITCH 
x 
70 
© 
z BETWEEN MEN 
E so © 
$ © 
> 
a Ox 7= 0-991 
a 50 x 
æ 
q 
a 
z WITHIN A MAN x 
E 40 x 
© 
x x 
30 o © © 
Z= 0248 
ROLL 24 25 26 27 28 29 3 


MEAN 
ERRORS f pitcH 19 20 


2 22 23 24 2s 


Fic. 4. The relationship of the means to the standard deviations between 
and within subjects. 


sume an equal amount of transfer in both 
directions, 

In the final part of the experiment, two 
groups (g) and (h) of ten air pilots who had 


Table 5 


Positive but Unequal Transfer between the Displays 


about 300 hours experience with the old dis- Salas ea ox 
play were tested on both displays in alternate (D) to (A) +6.5 +12.0 
order. Both groups had more men who pre- (A) to (D) +4.0 + 3.1 
ferred the new design (Table 6), but the dif- 

ferences in performance, while they slightly 

favored this design in pitch, were small and Table 6 


might have been due to chance, It is, per- 
haps, noteworthy that with so much previous 


The Experience, Preference and Performance 
of the Trained Subjects 


Table 4 Group of 10 Subjects g h 
Fewer Errors with One Display after n . 318 292 
Experience with the Other Experience (hts) 
No. with preference for:— New (A) 7 7 
Display Old (D) 2 A 
f Neith 1 
pubjects? Old (D) New (A) 2e 
mom: — ———__. Test ———— —_ id: = New (A 94 97 
Letter Pitch Roll Sequence Pitch Roll Errors in: match ant D 10.3 112 
c 20.7 29.2 = 14,2 17.2 Roll: New (A) 118 12,2 
d 16.3 20.8 aa 20.3 23.9 Old (D) 18 115 


466 


practice with the old design (D) they were 


not worse than they were when tested with 
the new (A). 


Discussion 


Craik (2) has pointed out that when an 
air pilot has a good unobstructed view of the 
external world, as in day flying, the aircraft 
can be considered to be an extension of his 
body which moves with him and which he can 
orientate in direct reference to the back- 
ground of the external horizon. But if, on 
the other hand, the view is obstructed or ab- 
sent, as at night or in cloud, the interior of 
the machine itself becomes the external en- 
vironment or background, and the pilot is 
then faced with the paradoxical fact that he 
and his background remain relatively fixed 
however he manipulates the controls. Craik, 
therefore, suggested a much larger representa- 
tion of the moving horizon, as in the old atti- 
tude indicator (D), as a way out of this 
situation. However, this may not entirely 
ensure that emergence of figure from ground, 
which forms the essential part of perception 
in this kind of display (Vernon, 4). Where 
the contrast between figure and ground is 
small the results of the present study suggest 
that it does not matter which of the two is 
moving and which fixed. But from the point 
of view of the immediate ad hoc problem of 
whether the aircraft or the horizon should 
move it can be argued a priori that it should 
be the aircraft, According to Rubin’s classifi- 
cation (Woodworth, 5) the aircraft has the 
characteristics of the “figure” rather than of 
the “ground,” because: (1) it has form while 
the horizon bar is relatively formless; (2) 
the aircraft tends to appear in front, the ho- 
rizon behind; and (3) “the: airgit is more 
impressive and “more apt Yo'-süggest mean- 
ing.” 

In the design of any experiment of this 
kind to investigate two differ 
two difficulties have to be co 
the comparability of the grou 
used; and (2) performance t 
positive or negative, from o 
other. 


If one group of subjects tested on one dis- 


ent displays, 
nsidered: (1) 
Ps of subjects 
transfer, either 
ne test to the 


R. C. Browne 


is compared with another group on a 
=a dilay, any observed difference may, 
on the face of it, be either an intrinsic func- 
tion of the group or of the display. _The use 
of large groups to which the subjects are 
allocated at random after careful matching by 
some analogous test, in theory, helps to en- 
sure their equivalence. But, in practice, the 
matching has to be demonstrably analogous 
to the problem in hand. It is not safe merely 
to assume or to assert this; neither is it easy 
usually to demonstrate this analogy, which at 
best means a lengthy piece of experimental 
work. The alternative design in which one 
half of the subjects is tested on one display 
and then on the other, and the other half of 
the subjects vice versa, can equally well be 
criticized on the ground that the second tests 
are only comparable if the amount of nue, 
is the same in both directions, which may wel 
be unlikely (as in the present study), if one 
display is easier to perceive than the other. 
The conclusion seems to be that the experi- 
mental design must be arbitrary to a certain 
extent, and that the most secure design is: 
perhaps, a combination of both these methods: : 

In an experiment which is generally com- 
parable to that described here, Loucks (3) 
showed that a display having a reversed oe, 
to that of the “old” (D) indicator describe 
in this paper produced a greater speed a” 
accuracy of response than did one similar to 
the old indicator itself. He was also using 
subjects with no previous experience who api 
peared to identify themselves with the moving 
component of the display irrespective of its 
appearance. He suggests that it would be 
even better if the moving component Wer? 
drawn in the shape of a small aircraft. But 
an experiment with this type of display w25 
not, in fact, tried, and the present study a 
gests that this change might have made little 
difference. However, the numbers of subjects 
used in it were relatively small and the su>- 
ject must still be considered open. It seems 
clear that in order to alter the ease of Pe 
ception, figure and ground must contrast } 
qualities other than mere relative moveme?” 
which alone seems unimportant. 


Figure and Ground in Two Dimensional Display 


Summary 


1. The speed and accuracy of human re- 
sponse to two displays which give information 
in two dimensions has been compared. Com- 
parisons of the instruction times needed be- 
fore the test could be started, and of the 
preferences of the subjects, have also been 
made. 

2. The two displays differed in respect of: 
(i) the relation of figure and ground and their 
relative movement; (ii) the damping of the 
oscillations after a given displacement; and 
(iii) their relative complication. ' 

3. The speed and accuracy of response was 
greater with the more simple display which 
had the heavier damping. This display also 
needed a shorter instruction time and was 
preferred more often by the subjects. Within 
the limits of the experimental design em- 
ployed the pure figure and ground relation 
alone appeared to play little part in percep- 
tion. 

4. The individual differences between sub- 
jects increased as perception became more 


difficult. 


467 


But the differences between differ- 
ent samples of a single subject’s performance 
remained constant. 

5. In an experimental design learning trans- 
fer between different tests must not be con- 
sidered to be the same. Neither is matching 
groups of subjects on an assumed analogous 
test experimentally safe. 


Received November 24, 1953. 


References 


1. Browne, R. C. Comparative trial of two atti- 
tude indicators. Royal Air Force Flying Per- 
sonnel Research Committee Reports Nos. 611 
and 61la, Feb. and April, 1945. 

2. Craik, K. J. W. Figure and ground in control of 
aircraft. (Unpublished), 1944. 

3. Loucks, R. B. An experimental evaluation of the 
interpretability of various types of aircraft 
attitude indicators. Psychological Research on 
Equipment Design, Report No. 19, p. 111. 
Washington: U. S. Government Printing Of- 
fice, 1947. 


4. Vernon, M. D. Visual perception. Cambridge 
University Press, 1937. 
5. Woodworth, R. S. Experimental psychology. 


London: Methuen, 1950, @ 


New Books, Monographs, and Pamphlets | 


ooks, monographs, and pamphie Y ; revi n G. Darley 
isti. i i hould be sent to Dr. Joh 
Book: h: d hlets for listing and possible review sl d 5 5 $ 
Editor. elect Graduate School, University of Minnesota, Minneapolis 14, Minnesota 
z 4 


nnel practices for college and uni- 

i a fy ae dlerical workers. Wilbur Donald 
Albright. Champaign, Tl.: College and Univer- 
sity Personnel Association, University of Illinois, 
5 . 131. $2.00. : 

ie he ER the Y.M.C.A. Seth Arsenian_ and 
Francis W. McKenzie. = York: Association 

s, 1954. Pp. 126. $2.00. 

eS, psychology of industry. J. A. C. Brown. 
Baltimore: Penguin Books, Inc., 1954. Pp, 309. 

erback. 

Pe aie Paul Campbell and Peter Howard. 
New York: Arrowhead Books, Inc., 1954. Pp. 126. 

mane in general psychology. Lester D. Crow 
and Alice Crow. New York: Barnes & Noble, 
Inc., 1954. Pp. 437. $1.75, Paperback. 

Dark destiny. Edgar E. Daniels. New York: Vant- 
age Press, Inc., 1954. Pp. 172, $3.00. 

Adjusting to a competitive economy—the human 
problem. M. J. Dooher, Editor. New York: 
American Management Association, 1954, Pp. 48. 
1.25, 

ane theory of knowledge. 
New York: Philosophical Library, 
$2.75. 

Psychometric methods. Second Edition. J. P, Guil- 
ford. New York: McGraw-Hill Book 
Inc., 1954, Pp. 597. $8.50. 

Nebraska symposium on 
Jones, Editor, Lincoln: 
Press, 1954, Pp. 322. 
back. 


Peter Fireman. 
1954. Pp. 50. 


Company, 


motivation. Marshall R. 
University of Nebraska 
$3.50, Cloth; $3.00, Paper- 


Conflict and mood. Patricia Kendall. G 
The Free Press, 1954, Pp. 182. $3.50, 
Psychomotor aspects of mental disease: An experi- 
mental study. H. E, King. Cambridge, Mass.: 
Harvard Uni ersity Press, 1954, Pp. 185. $3.50. 
Industrial colic. Arthur Kornhauser, Robert 
Dubin, and Arthur M. Ross, Editors. New York: 
* McGraw-Hill Book Company, Inc., 1954. Pp. 
$6.00. © 


lencoe, TIl.: 


551. 

Mathematical thinking in the social sciences. Paul 
F. Lazarsfeld, * Editor, Glencoe, Ill: The Free 
Press, 1954. Pp. 444, $10.00. 


The sexual nature of man and its management, 
Clarence Leuba. New York: Doubleday and Com- 
pany, Inc., 1954. Pp. 40. $.85. 

Effective leadership in human relations, 
Lindgren. New York: Hermitage 
1954. Pp. 287. $3.50, 

A psychological approach to accidents, 
Roberts Lykes. New York: 
1954. Pp. 138. $2.95. 

The encyclopedia of child 
Sidonie Matsner Gruenberg, 


House, Inc., 


Norman 
Vantage Press, Inc., 


care and guidance. 
Editor. New York: 


468 


Henry Clay. 


Doubleday and Company, Inc., 1954. Pp. 101¢ 
$7.50. } 
Studying and learning. Max ieee 
Doubleday and Company, Inc., : i 
$.95. i i 

School and child: A case history. Cecil i ee, 
East Lansing: Michigan State College Press, 19 
Pp. 221. $3.75. ; on 

Aspects of readability in the social sites PEleang 
M. Peterson. New York: Bureau sk ES 
Teachers College, Columbia University, i Pi 
118. $3.50. 4 

Psychotherapy and personality a Balto eal 
Rogers and Rosalind F. Dymon hav 4 ar 
cago: University of Chicago Press, + Ep. 445 
$6.00. : | 

Basic concepts in vocational salen. POSEN San 
derson. New York: McGraw-Hill mpan: 
Inc., 1954. Pp. 338. $4.50. 

The laws of life. Adrian Waldo Sasha, San Fra 
cisco: Living Knowledge Foundation, k 
224. $5.00. a eet 

The real enjoyment of living. ig te 
tel. New York: E. P. Dutton ” j 
Pp. 192. $2.75. E 

The mind and the universe. Chaile Ss ee 
New York: The William-Frederick Press, 

Pp. 173. $3.50. i 

Decision-making as an approach to the ee i 
ternational politics. Richard C. Snyder, 3 r 
Bruck, and Burton Sapin. Princeton, N. J.: : 
ganizational Behavior Section, Princeton Univ: | 
sity, 1954. Pp. 120. 

An inventory of social and economic research 
health. Frederick R, Strunk, Editor. New Yor 
Health Information Foundation, 1954. Pp. 180! 

The prediction of student-teaching success from pel 
sonality inventories, Fred T, Tyler. Berkeley 
University of California Press, 1954. Pp. 3 
$12.50. 


Psychology as a profession, 
New York: Doubleday and 
Pp. 65. $.95. 


Human engineering guide for equipment designe! 
Wesley E. Woodson. Berkeley: University 
California Press, 1954, Pp. 259. $3.50, 

Lotteries-for-housing. Martin Zethfield. New Yor 
The William-Frederick Press, 1954. Pp.26. $1, 


Robert I. Watso} 
Company, Inc., 195 


Correction 

In the New B 
the Journal o 
Price of Ana 
as $4.25, 


Ooks section of the August issue 
Í Applied Psychology on page 282 t 
stasi’s Psychological Testing was list 
The actual list price is $6.75. 


